Home | [81–90] << 91 92 93 94 95 96 97 98 99 100 >> [101–110] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Diego Cheda; Daniel Ponsa; Antonio Lopez | ||||
Title | Pedestrian Candidates Generation using Monocular Cues | Type | Conference Article | ||
Year | 2012 | Publication | IEEE Intelligent Vehicles Symposium | Abbreviated Journal | |
Volume | Issue | Pages | 7-12 | ||
Keywords | pedestrian detection | ||||
Abstract ![]() |
Common techniques for pedestrian candidates generation (e.g., sliding window approaches) are based on an exhaustive search over the image. This implies that the number of windows produced is huge, which translates into a significant time consumption in the classification stage. In this paper, we propose a method that significantly reduces the number of windows to be considered by a classifier. Our method is a monocular one that exploits geometric and depth information available on single images. Both representations of the world are fused together to generate pedestrian candidates based on an underlying model which is focused only on objects standing vertically on the ground plane and having certain height, according with their depths on the scene. We evaluate our algorithm on a challenging dataset and demonstrate its application for pedestrian detection, where a considerable reduction in the number of candidate windows is reached. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | IEEE Xplore | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1931-0587 | ISBN | 978-1-4673-2119-8 | Medium | |
Area | Expedition | Conference | IV | ||
Notes | ADAS | Approved | no | ||
Call Number | Admin @ si @ CPL2012c; ADAS @ adas @ cpl2012d | Serial | 2013 | ||
Permanent link to this record | |||||
Author | Sergio Escalera; Markus Weimer; Mikhail Burtsev; Valentin Malykh; Varvara Logacheva; Ryan Lowe; Iulian Vlad Serban; Yoshua Bengio; Alexander Rudnicky; Alan W. Black; Shrimai Prabhumoye; Łukasz Kidzinski; Mohanty Sharada; Carmichael Ong; Jennifer Hicks; Sergey Levine; Marcel Salathe; Scott Delp; Iker Huerga; Alexander Grigorenko; Leifur Thorbergsson; Anasuya Das; Kyla Nemitz; Jenna Sandker; Stephen King; Alexander S. Ecker; Leon A. Gatys; Matthias Bethge; Jordan Boyd Graber; Shi Feng; Pedro Rodriguez; Mohit Iyyer; He He; Hal Daume III; Sean McGregor; Amir Banifatemi; Alexey Kurakin; Ian Goodfellow; Samy Bengio | ||||
Title | Introduction to NIPS 2017 Competition Track | Type | Book Chapter | ||
Year | 2018 | Publication | The NIPS ’17 Competition: Building Intelligent Systems | Abbreviated Journal | |
Volume | Issue | Pages | 1-23 | ||
Keywords | |||||
Abstract ![]() |
Competitions have become a popular tool in the data science community to solve hard problems, assess the state of the art and spur new research directions. Companies like Kaggle and open source platforms like Codalab connect people with data and a data science problem to those with the skills and means to solve it. Hence, the question arises: What, if anything, could NIPS add to this rich ecosystem?
In 2017, we embarked to find out. We attracted 23 potential competitions, of which we selected five to be NIPS 2017 competitions. Our final selection features competitions advancing the state of the art in other sciences such as “Classifying Clinically Actionable Genetic Mutations” and “Learning to Run”. Others, like “The Conversational Intelligence Challenge” and “Adversarial Attacks and Defences” generated new data sets that we expect to impact the progress in their respective communities for years to come. And “Human-Computer Question Answering Competition” showed us just how far we as a field have come in ability and efficiency since the break-through performance of Watson in Jeopardy. Two additional competitions, DeepArt and AI XPRIZE Milestions, were also associated to the NIPS 2017 competition track, whose results are also presented within this chapter. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | Sergio Escalera; Markus Weimer | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-319-94042-7 | Medium | ||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ EWB2018 | Serial | 3200 | ||
Permanent link to this record | |||||
Author | Jaume Garcia; Albert Andaluz; Debora Gil; Francesc Carreras | ||||
Title | Decoupled External Forces in a Predictor-Corrector Segmentation Scheme for LV Contours in Tagged MR Images | Type | Conference Article | ||
Year | 2010 | Publication | 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society | Abbreviated Journal | |
Volume | Issue | Pages | 4805-4808 | ||
Keywords | |||||
Abstract ![]() |
Computation of functional regional scores requires proper identification of LV contours. On one hand, manual segmentation is robust, but it is time consuming and requires high expertise. On the other hand, the tag pattern in TMR sequences is a problem for automatic segmentation of LV boundaries. We propose a segmentation method based on a predictorcorrector (Active Contours – Shape Models) scheme. Special stress is put in the definition of the AC external forces. First, we introduce a semantic description of the LV that discriminates myocardial tissue by using texture and motion descriptors. Second, in order to ensure convergence regardless of the initial contour, the external energy is decoupled according to the orientation of the edges in the image potential. We have validated the model in terms of error in segmented contours and accuracy of regional clinical scores. | ||||
Address | Buenos Aires (Argentina) | ||||
Corporate Author | IEEE EMB | Thesis | |||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1557-170X | ISBN | 978-1-4244-4123-5 | Medium | |
Area | Expedition | Conference | EMBC | ||
Notes | IAM | Approved | no | ||
Call Number | IAM @ iam @ GAG2010 | Serial | 1514 | ||
Permanent link to this record | |||||
Author | Arjan Gijsenij; Theo Gevers; Joost Van de Weijer | ||||
Title | Computational Color Constancy: Survey and Experiments | Type | Journal Article | ||
Year | 2011 | Publication | IEEE Transactions on Image Processing | Abbreviated Journal | TIP |
Volume | 20 | Issue | 9 | Pages | 2475-2489 |
Keywords | computational color constancy;computer vision application;gamut-based method;learning-based method;static method;colour vision;computer vision;image colour analysis;learning (artificial intelligence);lighting | ||||
Abstract ![]() |
Computational color constancy is a fundamental prerequisite for many computer vision applications. This paper presents a survey of many recent developments and state-of-the- art methods. Several criteria are proposed that are used to assess the approaches. A taxonomy of existing algorithms is proposed and methods are separated in three groups: static methods, gamut-based methods and learning-based methods. Further, the experimental setup is discussed including an overview of publicly available data sets. Finally, various freely available methods, of which some are considered to be state-of-the-art, are evaluated on two data sets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1057-7149 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE;CIC | Approved | no | ||
Call Number | Admin @ si @ GGW2011 | Serial | 1717 | ||
Permanent link to this record | |||||
Author | Debora Gil; F. Javier Sanchez; Gloria Fernandez Esparrach; Jorge Bernal | ||||
Title | 3D Stable Spatio-temporal Polyp Localization in Colonoscopy Videos | Type | Book Chapter | ||
Year | 2015 | Publication | Computer-Assisted and Robotic Endoscopy. Revised selected papers of Second International Workshop, CARE 2015, Held in Conjunction with MICCAI 2015 | Abbreviated Journal | |
Volume | 9515 | Issue | Pages | 140-152 | |
Keywords | Colonoscopy, Polyp Detection, Polyp Localization, Region Extraction, Watersheds | ||||
Abstract ![]() |
Computational intelligent systems could reduce polyp miss rate in colonoscopy for colon cancer diagnosis and, thus, increase the efficiency of the procedure. One of the main problems of existing polyp localization methods is a lack of spatio-temporal stability in their response. We propose to explore the response of a given polyp localization across temporal windows in order to select
those image regions presenting the highest stable spatio-temporal response. Spatio-temporal stability is achieved by extracting 3D watershed regions on the temporal window. Stability in localization response is statistically determined by analysis of the variance of the output of the localization method inside each 3D region. We have explored the benefits of considering spatio-temporal stability in two different tasks: polyp localization and polyp detection. Experimental results indicate an average improvement of 21:5% in polyp localization and 43:78% in polyp detection. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CARE | ||
Notes | IAM; MV; 600.075 | Approved | no | ||
Call Number | Admin @ si @ GSF2015 | Serial | 2733 | ||
Permanent link to this record | |||||
Author | Debora Gil;Agnes Borras;Ruth Aris;Mariano Vazquez;Pierre Lafortune; Guillame Houzeaux | ||||
Title | What a difference in biomechanics cardiac fiber makes | Type | Conference Article | ||
Year | 2012 | Publication | Statistical Atlases And Computational Models Of The Heart: Imaging and Modelling Challenges | Abbreviated Journal | |
Volume | 7746 | Issue | Pages | 253-260 | |
Keywords | |||||
Abstract ![]() |
Computational simulations of the heart are a powerful tool for a comprehensive understanding of cardiac function and its intrinsic relationship with its muscular architecture. Cardiac biomechanical models require a vector field representing the orientation of cardiac fibers. A wrong orientation of the fibers can lead to a
non-realistic simulation of the heart functionality. In this paper we explore the impact of the fiber information on the simulated biomechanics of cardiac muscular anatomy. We have used the John Hopkins database to perform a biomechanical simulation using both a synthetic benchmark fiber distribution and the data obtained experimentally from DTI. Results illustrate how differences in fiber orientation affect heart deformation along cardiac cycle. |
||||
Address | Nice, France | ||||
Corporate Author | Thesis | ||||
Publisher | Springer Berlin Heidelberg | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-642-36960-5 | Medium | |
Area | Expedition | Conference | STACOM | ||
Notes | IAM | Approved | no | ||
Call Number | IAM @ iam @ GBA2012 | Serial | 1987 | ||
Permanent link to this record | |||||
Author | Fahad Shahbaz Khan; Shida Beigpour; Joost Van de Weijer; Michael Felsberg | ||||
Title | Painting-91: A Large Scale Database for Computational Painting Categorization | Type | Journal Article | ||
Year | 2014 | Publication | Machine Vision and Applications | Abbreviated Journal | MVAP |
Volume | 25 | Issue | 6 | Pages | 1385-1397 |
Keywords | |||||
Abstract ![]() |
Computer analysis of visual art, especially paintings, is an interesting cross-disciplinary research domain. Most of the research in the analysis of paintings involve medium to small range datasets with own specific settings. Interestingly, significant progress has been made in the field of object and scene recognition lately. A key factor in this success is the introduction and availability of benchmark datasets for evaluation. Surprisingly, such a benchmark setup is still missing in the area of computational painting categorization. In this work, we propose a novel large scale dataset of digital paintings. The dataset consists of paintings from 91 different painters. We further show three applications of our dataset namely: artist categorization, style classification and saliency detection. We investigate how local and global features popular in image classification perform for the tasks of artist and style categorization. For both categorization tasks, our experimental results suggest that combining multiple features significantly improves the final performance. We show that state-of-the-art computer vision methods can correctly classify 50 % of unseen paintings to its painter in a large dataset and correctly attribute its artistic style in over 60 % of the cases. Additionally, we explore the task of saliency detection on paintings and show experimental findings using state-of-the-art saliency estimation algorithms. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer Berlin Heidelberg | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0932-8092 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | CIC; LAMP; 600.074; 600.079 | Approved | no | ||
Call Number | Admin @ si @ KBW2014 | Serial | 2510 | ||
Permanent link to this record | |||||
Author | Vacit Oguz Yazici; Longlong Yu; Arnau Ramisa; Luis Herranz; Joost Van de Weijer | ||||
Title | Main product detection with graph networks for fashion | Type | Journal Article | ||
Year | 2024 | Publication | Multimedia Tools and Applications | Abbreviated Journal | MTAP |
Volume | 83 | Issue | Pages | 3215–3231 | |
Keywords | |||||
Abstract ![]() |
Computer vision has established a foothold in the online fashion retail industry. Main product detection is a crucial step of vision-based fashion product feed parsing pipelines, focused on identifying the bounding boxes that contain the product being sold in the gallery of images of the product page. The current state-of-the-art approach does not leverage the relations between regions in the image, and treats images of the same product independently, therefore not fully exploiting visual and product contextual information. In this paper, we propose a model that incorporates Graph Convolutional Networks (GCN) that jointly represent all detected bounding boxes in the gallery as nodes. We show that the proposed method is better than the state-of-the-art, especially, when we consider the scenario where title-input is missing at inference time and for cross-dataset evaluation, our method outperforms previous approaches by a large margin. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; MACO; 600.147; 600.167; 600.164; 600.161; 600.141; 601.309 | Approved | no | ||
Call Number | Admin @ si @ YYR2024 | Serial | 4017 | ||
Permanent link to this record | |||||
Author | Marc Masana | ||||
Title | Lifelong Learning of Neural Networks: Detecting Novelty and Adapting to New Domains without Forgetting | Type | Book Whole | ||
Year | 2020 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
Computer vision has gone through considerable changes in the last decade as neural networks have come into common use. As available computational capabilities have grown, neural networks have achieved breakthroughs in many computer vision tasks, and have even surpassed human performance in others. With accuracy being so high, focus has shifted to other issues and challenges. One research direction that saw a notable increase in interest is on lifelong learning systems. Such systems should be capable of efficiently performing tasks, identifying and learning new ones, and should moreover be able to deploy smaller versions of themselves which are experts on specific tasks. In this thesis, we contribute to research on lifelong learning and address the compression and adaptation of networks to small target domains, the incremental learning of networks faced with a variety of tasks, and finally the detection of out-of-distribution samples at inference time.
We explore how knowledge can be transferred from large pretrained models to more task-specific networks capable of running on smaller devices by extracting the most relevant information. Using a pretrained model provides more robust representations and a more stable initialization when learning a smaller task, which leads to higher performance and is known as domain adaptation. However, those models are too large for certain applications that need to be deployed on devices with limited memory and computational capacity. In this thesis we show that, after performing domain adaptation, some learned activations barely contribute to the predictions of the model. Therefore, we propose to apply network compression based on low-rank matrix decomposition using the activation statistics. This results in a significant reduction of the model size and the computational cost. Like human intelligence, machine intelligence aims to have the ability to learn and remember knowledge. However, when a trained neural network is presented with learning a new task, it ends up forgetting previous ones. This is known as catastrophic forgetting and its avoidance is studied in continual learning. The work presented in this thesis extensively surveys continual learning techniques and presents an approach to avoid catastrophic forgetting in sequential task learning scenarios. Our technique is based on using ternary masks in order to update a network to new tasks, reusing the knowledge of previous ones while not forgetting anything about them. In contrast to earlier work, our masks are applied to the activations of each layer instead of the weights. This considerably reduces the number of parameters to be added for each new task. Furthermore, the analysis on a wide range of work on incremental learning without access to the task-ID, provides insight on current state-of-the-art approaches that focus on avoiding catastrophic forgetting by using regularization, rehearsal of previous tasks from a small memory, or compensating the task-recency bias. Neural networks trained with a cross-entropy loss force the outputs of the model to tend toward a one-hot encoded vector. This leads to models being too overly confident when presented with images or classes that were not present in the training distribution. The capacity of a system to be aware of the boundaries of the learned tasks and identify anomalies or classes which have not been learned yet is key to lifelong learning and autonomous systems. In this thesis, we present a metric learning approach to out-of-distribution detection that learns the task at hand on an embedding space. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Joost Van de Weijer;Andrew Bagdanov | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-121011-9-5 | Medium | ||
Area | Expedition | Conference | |||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ Mas20 | Serial | 3481 | ||
Permanent link to this record | |||||
Author | Javier Vazquez; Maria Vanrell; Robert Benavente | ||||
Title | Color names as a constraint for Computer Vision problems | Type | Conference Article | ||
Year | 2010 | Publication | Proceedings of The CREATE 2010 Conference | Abbreviated Journal | |
Volume | Issue | Pages | 324–328 | ||
Keywords | |||||
Abstract ![]() |
Computer Vision Problems are usually ill-posed. Constraining de gamut of possible solutions is then a necessary step. Many constrains for different problems have been developed during years. In this paper, we present a different way of constraining some of these problems: the use of color names. In particular, we will focus on segmentation, representation ans constancy. | ||||
Address | Gjovik (Norway) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CREATE | ||
Notes | CIC | Approved | no | ||
Call Number | CAT @ cat @ VVB2010 | Serial | 1328 | ||
Permanent link to this record | |||||
Author | Antonio Lopez; Gabriel Villalonga; Laura Sellart; German Ros; David Vazquez; Jiaolong Xu; Javier Marin; Azadeh S. Mozafari | ||||
Title | Training my car to see using virtual worlds | Type | Journal Article | ||
Year | 2017 | Publication | Image and Vision Computing | Abbreviated Journal | IMAVIS |
Volume | 38 | Issue | Pages | 102-118 | |
Keywords | |||||
Abstract ![]() |
Computer vision technologies are at the core of different advanced driver assistance systems (ADAS) and will play a key role in oncoming autonomous vehicles too. One of the main challenges for such technologies is to perceive the driving environment, i.e. to detect and track relevant driving information in a reliable manner (e.g. pedestrians in the vehicle route, free space to drive through). Nowadays it is clear that machine learning techniques are essential for developing such a visual perception for driving. In particular, the standard working pipeline consists of collecting data (i.e. on-board images), manually annotating the data (e.g. drawing bounding boxes around pedestrians), learning a discriminative data representation taking advantage of such annotations (e.g. a deformable part-based model, a deep convolutional neural network), and then assessing the reliability of such representation with the acquired data. In the last two decades most of the research efforts focused on representation learning (first, designing descriptors and learning classifiers; later doing it end-to-end). Hence, collecting data and, especially, annotating it, is essential for learning good representations. While this has been the case from the very beginning, only after the disruptive appearance of deep convolutional neural networks that it became a serious issue due to their data hungry nature. In this context, the problem is that manual data annotation is a tiresome work prone to errors. Accordingly, in the late 00’s we initiated a research line consisting of training visual models using photo-realistic computer graphics, especially focusing on assisted and autonomous driving. In this paper, we summarize such a work and show how it has become a new tendency with increasing acceptance. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.118 | Approved | no | ||
Call Number | Admin @ si @ LVS2017 | Serial | 2985 | ||
Permanent link to this record | |||||
Author | Cristina Sanchez Montes; Jorge Bernal; Ana Garcia Rodriguez; Henry Cordova; Gloria Fernandez Esparrach | ||||
Title | Revisión de métodos computacionales de detección y clasificación de pólipos en imagen de colonoscopia | Type | Journal Article | ||
Year | 2020 | Publication | Gastroenterología y Hepatología | Abbreviated Journal | GH |
Volume | 43 | Issue | 4 | Pages | 222-232 |
Keywords | |||||
Abstract ![]() |
Computer-aided diagnosis (CAD) is a tool with great potential to help endoscopists in the tasks of detecting and histologically classifying colorectal polyps. In recent years, different technologies have been described and their potential utility has been increasingly evidenced, which has generated great expectations among scientific societies. However, most of these works are retrospective and use images of different quality and characteristics which are analysed off line. This review aims to familiarise gastroenterologists with computational methods and the particularities of endoscopic imaging, which have an impact on image processing analysis. Finally, the publicly available image databases, needed to compare and confirm the results obtained with different methods, are presented. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MV; | Approved | no | ||
Call Number | Admin @ si @ SBG2020 | Serial | 3404 | ||
Permanent link to this record | |||||
Author | Juan Ramon Terven Salinas; Joaquin Salas; Bogdan Raducanu | ||||
Title | New Opportunities for Computer Vision-Based Assistive Technology Systems for the Visually Impaired | Type | Journal Article | ||
Year | 2014 | Publication | Computer | Abbreviated Journal | COMP |
Volume | 47 | Issue | 4 | Pages | 52-58 |
Keywords | |||||
Abstract ![]() |
Computing advances and increased smartphone use gives technology system designers greater flexibility in exploiting computer vision to support visually impaired users. Understanding these users' needs will certainly provide insight for the development of improved usability of computing devices. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0018-9162 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | LAMP; | Approved | no | ||
Call Number | Admin @ si @ TSR2014a | Serial | 2317 | ||
Permanent link to this record | |||||
Author | German Barquero; Sergio Escalera; Cristina Palmero | ||||
Title | Seamless Human Motion Composition with Blended Positional Encodings | Type | Miscellaneous | ||
Year | 2024 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ BEP2024 | Serial | 4022 | ||
Permanent link to this record | |||||
Author | Yaxing Wang; Joost Van de Weijer; Lu Yu; Shangling Jui | ||||
Title | Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data | Type | Conference Article | ||
Year | 2022 | Publication | 10th International Conference on Learning Representations | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
Conditional image synthesis is an integral part of many X2I translation systems, including image-to-image, text-to-image and audio-to-image translation systems. Training these large systems generally requires huge amounts of training data.
Therefore, we investigate knowledge distillation to transfer knowledge from a high-quality unconditioned generative model (e.g., StyleGAN) to a conditioned synthetic image generation modules in a variety of systems. To initialize the conditional and reference branch (from a unconditional GAN) we exploit the style mixing characteristics of high-quality GANs to generate an infinite supply of style-mixed triplets to perform the knowledge distillation. Extensive experimental results in a number of image generation tasks (i.e., image-to-image, semantic segmentation-to-image, text-to-image and audio-to-image) demonstrate qualitatively and quantitatively that our method successfully transfers knowledge to the synthetic image generation modules, resulting in more realistic images than previous methods as confirmed by a significant drop in the FID. |
||||
Address | Virtual | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICLR | ||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ WWY2022 | Serial | 3791 | ||
Permanent link to this record |