Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 >>

Details

Records
Author	Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz; Shangling Jui
Title	Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation			Type	Conference Article
Year	2021	Publication	Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g. due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might no longer align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors, and propose a self regularization loss to decrease the negative impact of noisy neighbors. Furthermore, to aggregate information with more context, we consider expanded neighborhoods with small affinity values. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets. Code is available in https://github.com/Albert0147/SFDA_neighbors.
Address	Online; December 7-10, 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	NIPS
Notes	LAMP; 600.147; 600.141			Approved	no
Call Number	Admin @ si @			Serial	3691
Permanent link to this record



Author	Shiqi Yang; Kai Wang; Luis Herranz; Joost Van de Weijer
Title	On Implicit Attribute Localization for Generalized Zero-Shot Learning			Type	Journal Article
Year	2021	Publication	IEEE Signal Processing Letters	Abbreviated Journal
Volume	28	Issue		Pages	872 - 876
Keywords
Abstract	Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their attribute-based descriptions. Since attributes are often related to specific parts of objects, many recent works focus on discovering discriminative regions. However, these methods usually require additional complex part detection modules or attention mechanisms. In this paper, 1) we show that common ZSL backbones (without explicit attention nor part detection) can implicitly localize attributes, yet this property is not exploited. 2) Exploiting it, we then propose SELAR, a simple method that further encourages attribute localization, surprisingly achieving very competitive generalized ZSL (GZSL) performance when compared with more complex state-of-the-art methods. Our findings provide useful insight for designing future GZSL methods, and SELAR provides an easy to implement yet strong baseline.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	YWH2021			Serial	3563
Permanent link to this record



Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title	DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis			Type	Conference Article
Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume	12823	Issue		Pages	555–568
Keywords
Abstract	Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind.
Address	Lausanne; Suissa; September 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.140; 110.312			Approved	no
Call Number	Admin @ si @ BRL2021a			Serial	3573
Permanent link to this record



Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title	Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts			Type	Journal Article
Year	2021	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
Volume	24	Issue		Pages	269–281
Keywords
Abstract	Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.140; 110.312			Approved	no
Call Number	Admin @ si @ BRL2021b			Serial	3574
Permanent link to this record



Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title	Graph-Based Deep Generative Modelling for Document Layout Generation			Type	Conference Article
Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume	12917	Issue		Pages	525-537
Keywords
Abstract	One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.
Address	Lausanne; Suissa; September 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.140; 110.312			Approved	no
Call Number	Admin @ si @ BRL2021			Serial	3676
Permanent link to this record



Author	Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
Title	ICDAR 2021 Competition on Document Visual Question Answering			Type	Conference Article
Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume		Issue		Pages	635-649
Keywords
Abstract	In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.
Address	VIRTUAL; Lausanne; Suissa; September 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ TMJ2021			Serial	3624
Permanent link to this record



Author	Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title	Document Collection Visual Question Answering			Type	Conference Article
Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume	12822	Issue		Pages	778-792
Keywords	Document collection; Visual Question Answering
Abstract	Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ TKV2021			Serial	3622
Permanent link to this record



Author	Ricardo Dario Perez Principi; Cristina Palmero; Julio C. S. Jacques Junior; Sergio Escalera
Title	On the Effect of Observed Subject Biases in Apparent Personality Analysis from Audio-visual Signals			Type	Journal Article
Year	2021	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
Volume	12	Issue	3	Pages	607-621
Keywords
Abstract	Personality perception is implicitly biased due to many subjective factors, such as cultural, social, contextual, gender and appearance. Approaches developed for automatic personality perception are not expected to predict the real personality of the target, but the personality external observers attributed to it. Hence, they have to deal with human bias, inherently transferred to the training data. However, bias analysis in personality computing is an almost unexplored area. In this work, we study different possible sources of bias affecting personality perception, including emotions from facial expressions, attractiveness, age, gender, and ethnicity, as well as their influence on prediction ability for apparent personality estimation. To this end, we propose a multi-modal deep neural network that combines raw audio and visual information alongside predictions of attribute-specific models to regress apparent personality. We also analyse spatio-temporal aggregation schemes and the effect of different time intervals on first impressions. We base our study on the ChaLearn First Impressions dataset, consisting of one-person conversational videos. Our model shows state-of-the-art results regressing apparent personality based on the Big-Five model. Furthermore, given the interpretability nature of our network design, we provide an incremental analysis on the impact of each possible source of bias on final network predictions.
Address	1 July-Sept. 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ PPJ2019			Serial	3312
Permanent link to this record



Author	Reza Azad; Afshin Bozorgpour; Maryam Asadi-Aghbolaghi; Dorit Merhof; Sergio Escalera
Title	Deep Frequency Re-Calibration U-Net for Medical Image Segmentation			Type	Conference Article
Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
Volume		Issue		Pages	3274-3283
Keywords
Abstract	We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions that suffer from scalability, applicability and compatibility. By limiting our scope to garment animation only, we are able to propose a simple model that can animate any outfit, independently of its topology, vertex order or connectivity. Our proposed architecture maps outfits to animated 3D models into the standard format for 3D animation (blend weights and blend shapes matrices), automatically providing of compatibility with any graphics engine. We also propose a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.
Address	VIRTUAL; October 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ ABA2021			Serial	3645
Permanent link to this record



Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Mohammad Sabokrou
Title	Sign Language Production: A Review			Type	Conference Article
Year	2021	Publication	Conference on Computer Vision and Pattern Recognition Workshops	Abbreviated Journal
Volume		Issue		Pages	3472-3481
Keywords
Abstract	Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. This survey aims to briefly summarize recent achievements in SLP, discussing their advantages, limitations, and future directions of research.
Address	Virtual; June 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ RKE2021b			Serial	3603
Permanent link to this record



Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
Title	Sign Language Recognition: A Deep Survey			Type	Journal Article
Year	2021	Publication	Expert Systems With Applications	Abbreviated Journal	ESWA
Volume	164	Issue		Pages	113794
Keywords
Abstract	Sign language, as a different form of the communication language, is important to large groups of people in society. There are different signs in each sign language with variability in hand shape, motion profile, and position of the hand, face, and body parts contributing to each sign. So, visual sign language recognition is a complex research area in computer vision. Many models have been proposed by different researchers with significant improvement by deep learning approaches in recent years. In this survey, we review the vision-based proposed models of sign language recognition using deep learning approaches from the last five years. While the overall trend of the proposed models indicates a significant improvement in recognition accuracy in sign language recognition, there are some challenges yet that need to be solved. We present a taxonomy to categorize the proposed models for isolated and continuous sign language recognition, discussing applications, datasets, hybrid models, complexity, and future lines of research in the field.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ RKE2021a			Serial	3521
Permanent link to this record



Author	Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Sabari Nathan; Priya Kansal; Armin Mehri; Parichehr Behjati Ardakani; A.Dalal; A.Akula; D.Sharma; S.Pandey; B.Kumar; J.Yao; R.Wu; KFeng; N.Li; Y.Zhao; H.Patel; V. Chudasama; K.Pjajapati; A.Sarvaiya; K.Upla; K.Raja; R.Ramachandra; C.Bush; F.Almasri; T.Vandamme; O.Debeir; N.Gutierrez; Q.Nguyen; W.Beksi
Title	Thermal Image Super-Resolution Challenge – PBVS 2021			Type	Conference Article
Year	2021	Publication	Conference on Computer Vision and Pattern Recognition Workshops	Abbreviated Journal
Volume		Issue		Pages	4359-4367
Keywords
Abstract	This paper presents results from the second Thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the same thermal image dataset considered during the first challenge has been used; only mid-resolution (MR) and high-resolution (HR) sets have been considered. The dataset consists of 951 training images and 50 testing images for each resolution. A set of 20 images for each resolution is kept aside for evaluation. The two evaluation methodologies proposed for the first challenge are also considered in this opportunity. The first evaluation task consists of measuring the PSNR and SSIM between the obtained SR image and the corresponding ground truth (i.e., the HR thermal image downsampled by four). The second evaluation also consists of measuring the PSNR and SSIM, but in this case, considers the x2 SR obtained from the given MR thermal image; this evaluation is performed between the SR image with respect to the semi-registered HR image, which has been acquired with another camera. The results outperformed those from the first challenge, thus showing an improvement in both evaluation metrics.
Address	Virtual; June 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	MSIAU; 600.130; 600.122			Approved	no
Call Number	Admin @ si @ RSV2021			Serial	3581
Permanent link to this record



Author	Pau Torras; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes
Title	A Transcription Is All You Need: Learning to Align through Attention			Type	Conference Article
Year	2021	Publication	14th IAPR International Workshop on Graphics Recognition	Abbreviated Journal
Volume	12916	Issue		Pages	141–146
Keywords
Abstract	Historical ciphered manuscripts are a type of document where graphical symbols are used to encrypt their content instead of regular text. Nowadays, expert transcriptions can be found in libraries alongside the corresponding manuscript images. However, those transcriptions are not aligned, so these are barely usable for training deep learning-based recognition methods. To solve this issue, we propose a method to align each symbol in the transcript of an image with its visual representation by using an attention-based Sequence to Sequence (Seq2Seq) model. The core idea is that, by learning to recognise symbols sequence within a cipher line image, the model also identifies their position implicitly through an attention mechanism. Thus, the resulting symbol segmentation can be later used for training algorithms. The experimental evaluation shows that this method is promising, especially taking into account the small size of the cipher dataset.
Address	Virtual; September 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	GREC
Notes	DAG; 602.230; 600.140; 600.121			Approved	no
Call Number	Admin @ si @ TSC2021			Serial	3619
Permanent link to this record



Author	Pau Torras; Arnau Baro; Lei Kang; Alicia Fornes
Title	On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition			Type	Conference Article
Year	2021	Publication	International Society for Music Information Retrieval Conference	Abbreviated Journal
Volume		Issue		Pages	690-696
Keywords
Abstract	Despite the latest advances in Deep Learning, the recognition of handwritten music scores is still a challenging endeavour. Even though the recent Sequence to Sequence(Seq2Seq) architectures have demonstrated its capacity to reliably recognise handwritten text, their performance is still far from satisfactory when applied to historical handwritten scores. Indeed, the ambiguous nature of handwriting, the non-standard musical notation employed by composers of the time and the decaying state of old paper make these scores remarkably difficult to read, sometimes even by trained humans. Thus, in this work we explore the incorporation of language models into a Seq2Seq-based architecture to try to improve transcriptions where the aforementioned unclear writing produces statistically unsound mistakes, which as far as we know, has never been attempted for this field of research on this architecture. After studying various Language Model integration techniques, the experimental evaluation on historical handwritten music scores shows a significant improvement over the state of the art, showing that this is a promising research direction for dealing with such difficult manuscripts.
Address	Virtual; November 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ISMIR
Notes	DAG; 600.140; 600.121			Approved	no
Call Number	Admin @ si @ TBK2021			Serial	3616
Permanent link to this record



Author	Pau Riba; Sounak Dey; Ali Furkan Biten; Josep Llados
Title	Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild			Type	Miscellaneous
Year	2021	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the performance of SGOL approaches based on available simple yet powerful methods. We advance prior arts by proposing a sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a hard classification and alleviates the domain gap between sketches and images to localize object instances. Although the main goal of SGOL is focused on object detection, we explored its natural extension to sketch-guided instance segmentation. This novel task allows to move towards identifying the objects at pixel level, which is of key importance in several applications. We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results. All training and testing code of our model will be released to facilitate future researchhttps://github.com/priba/sgol_wild.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ RDB2021			Serial	3674
Permanent link to this record