Home | << 1 2 3 4 5 6 7 8 9 >> |
Records | |||||
---|---|---|---|---|---|
Author | Yaxing Wang; Hector Laria Mantecon; Joost Van de Weijer; Laura Lopez-Fuentes; Bogdan Raducanu | ||||
Title | TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets | Type | Conference Article | ||
Year | 2021 | Publication | 19th IEEE International Conference on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 13990-13999 | ||
Keywords | |||||
Abstract | Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require the learning of millions of parameters from scratch. This drawback severely limits its application on small domains. In this paper, we propose a new transfer learning for I2I translation (TransferI2I). We decouple our learning process into the image generation step and the I2I translation step. In the first step we propose two novel techniques: source-target initialization and self-initialization of the adaptor layer. The former finetunes the pretrained generative model (e.g., StyleGAN) on source and target data. The latter allows to initialize all non-pretrained network parameters without the need of any data. These techniques provide a better initialization for the I2I translation step. In addition, we introduce an auxiliary GAN that further facilitates the training of deep I2I systems even from small datasets. In extensive experiments on three datasets, (Animal faces, Birds, and Foods), we show that we outperform existing methods and that mFID improves on several datasets with over 25 points. | ||||
Address | Virtual; October 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCV | ||
Notes | LAMP; 600.147; 602.200; 600.120 | Approved | no | ||
Call Number | Admin @ si @ WLW2021 | Serial | 3604 | ||
Permanent link to this record | |||||
Author | Dorota Kaminska; Kadir Aktas; Davit Rizhinashvili; Danila Kuklyanov; Abdallah Hussein Sham; Sergio Escalera; Kamal Nasrollahi; Thomas B. Moeslund; Gholamreza Anbarjafari | ||||
Title | Two-stage Recognition and Beyond for Compound Facial Emotion Recognition | Type | Journal Article | ||
Year | 2021 | Publication | Electronics | Abbreviated Journal | ELEC |
Volume | 10 | Issue | 22 | Pages | 2847 |
Keywords | compound emotion recognition; facial expression recognition; dominant and complementary emotion recognition; deep learning | ||||
Abstract | Facial emotion recognition is an inherently complex problem due to individual diversity in facial features and racial and cultural differences. Moreover, facial expressions typically reflect the mixture of people’s emotional statuses, which can be expressed using compound emotions. Compound facial emotion recognition makes the problem even more difficult because the discrimination between dominant and complementary emotions is usually weak. We have created a database that includes 31,250 facial images with different emotions of 115 subjects whose gender distribution is almost uniform to address compound emotion recognition. In addition, we have organized a competition based on the proposed dataset, held at FG workshop 2020. This paper analyzes the winner’s approach—a two-stage recognition method (1st stage, coarse recognition; 2nd stage, fine recognition), which enhances the classification of symmetrical emotion labels. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ KAR2021 | Serial | 3642 | ||
Permanent link to this record | |||||
Author | Jialuo Chen; Mohamed Ali Souibgui; Alicia Fornes; Beata Megyesi | ||||
Title | Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images | Type | Conference Article | ||
Year | 2021 | Publication | 4th International Conference on Historical Cryptology | Abbreviated Journal | |
Volume | Issue | Pages | 34-37 | ||
Keywords | |||||
Abstract | Historical ciphers contain a wide range ofsymbols from various symbol sets. Iden-tifying the cipher alphabet is a prerequi-site before decryption can take place andis a time-consuming process. In this workwe explore the use of image processing foridentifying the underlying alphabet in ci-pher images, and to compare alphabets be-tween ciphers. The experiments show thatciphers with similar alphabets can be suc-cessfully discovered through clustering. | ||||
Address | Virtual; September 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | HistoCrypt | ||
Notes | DAG; 602.230; 600.140; 600.121 | Approved | no | ||
Call Number | Admin @ si @ CSF2021 | Serial | 3617 | ||
Permanent link to this record | |||||
Author | Albert Rial-Farras; Meysam Madadi; Sergio Escalera | ||||
Title | UV-based reconstruction of 3D garments from a single RGB image | Type | Conference Article | ||
Year | 2021 | Publication | 16th IEEE International Conference on Automatic Face and Gesture Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1-8 | ||
Keywords | |||||
Abstract | Garments are highly detailed and dynamic objects made up of particles that interact with each other and with other objects, making the task of 2D to 3D garment reconstruction extremely challenging. Therefore, having a lightweight 3D representation capable of modelling fine details is of great importance. This work presents a deep learning framework based on Generative Adversarial Networks (GANs) to reconstruct 3D garment models from a single RGB image. It has the peculiarity of using UV maps to represent 3D data, a lightweight representation capable of dealing with high-resolution details and wrinkles. With this model and kind of 3D representation, we achieve state-of-the-art results on the CLOTH3D++ dataset, generating good quality and realistic garment reconstructions regardless of the garment topology and shape, human pose, occlusions and lightning. | ||||
Address | Virtual; December 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | FG | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ RME2021 | Serial | 3639 | ||
Permanent link to this record | |||||
Author | Carola Figueroa Flores | ||||
Title | Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency | Type | Book Whole | ||
Year | 2021 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | computer vision; visual saliency; fine-grained object recognition; convolutional neural networks; images classification | ||||
Abstract | For humans, the recognition of objects is an almost instantaneous, precise and
extremely adaptable process. Furthermore, we have the innate capability to learn new object classes from only few examples. The human brain lowers the complexity of the incoming data by filtering out part of the information and only processing those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple glance the most important or salient regions from an image. This mechanism can be observed by analyzing on which parts of images subjects place attention; where they fix their eyes when an image is shown to them. The most accurate way to record this behavior is to track eye movements while displaying images. Computational saliency estimation aims to identify to what extent regions or objects stand out with respect to their surroundings to human observers. Saliency maps can be used in a wide range of applications including object detection, image and video compression, and visual tracking. The majority of research in the field has focused on automatically estimating saliency maps given an input image. Instead, in this thesis, we set out to incorporate saliency maps in an object recognition pipeline: we want to investigate whether saliency maps can improve object recognition results. In this thesis, we identify several problems related to visual saliency estimation. First, to what extent the estimation of saliency can be exploited to improve the training of an object recognition model when scarce training data is available. To solve this problem, we design an image classification network that incorporates saliency information as input. This network processes the saliency map through a dedicated network branch and uses the resulting characteristics to modulate the standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive experiments on standard benchmark datasets for fine-grained object recognition, we show that our proposed architecture can significantly improve performance, especially on dataset with scarce training data. Next, we address the main drawback of the above pipeline: SMIC requires an explicit saliency algorithm that must be trained on a saliency dataset. To solve this, we implement a hallucination mechanism that allows us to incorporate the saliency estimation branch in an end-to-end trained neural network architecture that only needs the RGB image as an input. A side-effect of this architecture is the estimation of saliency maps. In experiments, we show that this architecture can obtain similar results on object recognition as SMIC but without the requirement of ground truth saliency maps to train the system. Finally, we evaluated the accuracy of the saliency maps that occur as a sideeffect of object recognition. For this purpose, we use a set of benchmark datasets for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human eye-tracking experiments. Our results show that these saliency maps can obtain competitive results on benchmark saliency maps. On one synthetic saliency dataset this method even obtains the state-of-the-art without the need of ever having seen an actual saliency image for training. |
||||
Address | March 2021 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Joost Van de Weijer;Bogdan Raducanu | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-122714-4-7 | Medium | ||
Area | Expedition | Conference | |||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ Fig2021 | Serial | 3600 | ||
Permanent link to this record | |||||
Author | Idoia Ruiz; Lorenzo Porzi; Samuel Rota Bulo; Peter Kontschieder; Joan Serrat | ||||
Title | Weakly Supervised Multi-Object Tracking and Segmentation | Type | Conference Article | ||
Year | 2021 | Publication | IEEE Winter Conference on Applications of Computer Vision Workshops | Abbreviated Journal | |
Volume | Issue | Pages | 125-133 | ||
Keywords | |||||
Abstract | We introduce the problem of weakly supervised MultiObject Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation.
To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively. |
||||
Address | Virtual; January 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACVW | ||
Notes | ADAS; 600.118; 600.124 | Approved | no | ||
Call Number | Admin @ si @ RPR2021 | Serial | 3548 | ||
Permanent link to this record | |||||
Author | Javad Zolfaghari Bengar; Bogdan Raducanu; Joost Van de Weijer | ||||
Title | When Deep Learners Change Their Mind: Learning Dynamics for Active Learning | Type | Conference Article | ||
Year | 2021 | Publication | 19th International Conference on Computer Analysis of Images and Patterns | Abbreviated Journal | |
Volume | 13052 | Issue | 1 | Pages | 403-413 |
Keywords | |||||
Abstract | Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results. | ||||
Address | September 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CAIP | ||
Notes | LAMP; | Approved | no | ||
Call Number | Admin @ si @ ZRV2021 | Serial | 3673 | ||
Permanent link to this record | |||||
Author | Zhengying Liu; Adrien Pavao; Zhen Xu; Sergio Escalera; Fabio Ferreira; Isabelle Guyon; Sirui Hong; Frank Hutter; Rongrong Ji; Julio C. S. Jacques Junior; Ge Li; Marius Lindauer; Zhipeng Luo; Meysam Madadi; Thomas Nierhoff; Kangning Niu; Chunguang Pan; Danny Stoll; Sebastien Treguer; Jin Wang; Peng Wang; Chenglin Wu; Youcheng Xiong; Arber Zela; Yang Zhang | ||||
Title | Winning Solutions and Post-Challenge Analyses of the ChaLearn AutoDL Challenge 2019 | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Pattern Analysis and Machine Intelligence | Abbreviated Journal | TPAMI |
Volume | 43 | Issue | 9 | Pages | 3108 - 3125 |
Keywords | |||||
Abstract | This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a “meta-learner”, “data ingestor”, “model selector”, “model/learner”, and “evaluator”. This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free “AutoDL self-service.” | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ LPX2021 | Serial | 3587 | ||
Permanent link to this record |