toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Edgar Riba edit  openurl
  Title Geometric Computer Vision Techniques for Scene Reconstruction Type Book Whole
  Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords  
  Abstract From the early stages of Computer Vision, scene reconstruction has been one of the most studied topics leading to a wide variety of new discoveries and applications. Object grasping and manipulation, localization and mapping, or even visual effect generation are different examples of applications in which scene reconstruction has taken an important role for industries such as robotics, factory automation, or audio visual production. However, scene reconstruction is an extensive topic that can be approached in many different ways with already existing solutions that effectively work in controlled environments. Formally, the problem of scene reconstruction can be formulated as a sequence of independent processes which compose a pipeline. In this thesis, we analyse some parts of the reconstruction pipeline from which we contribute with novel methods using Convolutional Neural Networks (CNN) proposing innovative solutions that consider the optimisation of the methods in an end-to-end fashion. First, we review the state of the art of classical local features detectors and descriptors and contribute with two novel methods that inherently improve pre-existing solutions in the scene reconstruction pipeline.

It is a fact that computer science and software engineering are two fields that usually go hand in hand and evolve according to mutual needs making easier the design of complex and efficient algorithms. For this reason, we contribute with Kornia, a library specifically designed to work with classical computer vision techniques along with deep neural networks. In essence, we created a framework that eases the design of complex pipelines for computer vision algorithms so that can be included within neural networks and be used to backpropagate gradients throw a common optimisation framework. Finally, in the last chapter of this thesis we develop the aforementioned concept of designing end-to-end systems with classical projective geometry. Thus, we contribute with a solution to the problem of synthetic view generation by hallucinating novel views from high deformable cloths objects using a geometry aware end-to-end system. To summarize, in this thesis we demonstrate that with a proper design that combine classical geometric computer vision methods with deep learning techniques can lead to improve pre-existing solutions for the problem of scene reconstruction.
 
  Address February 2021  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Place of Publication Editor Daniel Ponsa  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ Rib2021 Serial 3610  
Permanent link to this record
 

 
Author Lei Kang; Pau Riba; Marcal Rusinol; Alicia Fornes; Mauricio Villegas edit  url
doi  openurl
  Title Content and Style Aware Generation of Text-line Images for Handwriting Recognition Type Journal Article
  Year 2021 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume Issue Pages (down)  
  Keywords  
  Abstract Handwritten Text Recognition has achieved an impressive performance in public benchmarks. However, due to the high inter- and intra-class variability between handwriting styles, such recognizers need to be trained using huge volumes of manually labeled training data. To alleviate this labor-consuming problem, synthetic data produced with TrueType fonts has been often used in the training loop to gain volume and augment the handwriting style variability. However, there is a significant style bias between synthetic and real data which hinders the improvement of recognition performance. To deal with such limitations, we propose a generative method for handwritten text-line images, which is conditioned on both visual appearance and textual content. Our method is able to produce long text-line samples with diverse handwriting styles. Once properly trained, our method can also be adapted to new target data by only accessing unlabeled text-line images to mimic handwritten styles and produce images with any textual content. Extensive experiments have been done on making use of the generated samples to boost Handwritten Text Recognition performance. Both qualitative and quantitative results demonstrate that the proposed approach outperforms the current state of the art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ KRR2021 Serial 3612  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Ali Furkan Biten; Sounak Dey; Alicia Fornes; Yousri Kessentini; Lluis Gomez; Dimosthenis Karatzas; Josep Llados edit   pdf
url  doi
openurl 
  Title One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords Document Analysis  
  Abstract Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). This appears, for example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the content. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method, achieving competitive results compared to the usage of real annotated data.  
  Address Virtual; January 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 602.230; 600.140 Approved no  
  Call Number Admin @ si @ SBD2022 Serial 3615  
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz; Kadi Bouatouch edit  url
openurl 
  Title Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 4) Type Book Whole
  Year 2021 Publication Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021 Abbreviated Journal  
  Volume 4 Issue Pages (down)  
  Keywords  
  Abstract This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference VISIGRAPP  
  Notes MILAB Approved no  
  Call Number Admin @ si @ FRB2021a Serial 3627  
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz; Kadi Bouatouch edit  url
openurl 
  Title Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – (Volume 5) Type Book Whole
  Year 2021 Publication Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – VISIGRAPP 2021 Abbreviated Journal  
  Volume 5 Issue Pages (down)  
  Keywords  
  Abstract This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference VISIGRAPP  
  Notes MILAB Approved no  
  Call Number Admin @ si @ FRB2021b Serial 3628  
Permanent link to this record
 

 
Author Vacit Oguz Yazici edit  isbn
openurl 
  Title Towards Smart Fashion: Visual Recognition of Products and Attributes Type Book Whole
  Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords  
  Abstract Artificial intelligence is innovating the fashion industry by proposing new applications and solutions to the problems encountered by researchers and engineers working in the industry. In this thesis, we address three of these problems. In the first part of the thesis, we tackle the problem of multi-label image classification which is very related to fashion attribute recognition. In the second part of the thesis, we address two problems that are specific to fashion. Firstly, we address the problem of main product detection which is the task of associating correct image parts (e.g. bounding boxes) with the fashion product being sold. Secondly, we address the problem of color naming for multicolored fashion items. The task of multi-label image classification consists in assigning various concepts such as objects or attributes to images. Usually, there are dependencies that can be learned between the concepts to capture label correlations (chair and table classes are more likely to co-exist than chair and giraffe).
If we treat the multi-label image classification problem as an orderless set prediction problem, we can exploit recurrent neural networks (RNN) to capture label correlations. However, RNNs are trained to predict ordered sequences of tokens, so if the order of the predicted sequence is different than the order of the ground truth sequence, there will be penalization although the predictions are correct. Therefore, in the first part of the thesis, we propose an orderless loss function which will order the labels in the ground truth sequence dynamically in a way that the minimum loss is achieved. This results in a significant improvement of RNN models on multi-label image classification over the previous methods.
However, RNNs suffer from long term dependencies when the cardinality of set grows bigger. The decoding process might stop early if the current hidden state cannot find any object and outputs the termination token. This would cause the remaining classes not to be predicted and lower recall metric. Transformers can be used to avoid the long term dependency problem exploiting their selfattention modules that process sequential data simultaneously. Consequently, we propose a novel transformer model for multi-label image classification which surpasses the state-of-the-art results by a large margin.
In the second part of thesis, we focus on two fashion-specific problems. Main product detection is the task of associating image parts with the fashion product that is being sold, generally using associated textual metadata (product title or description). Normally, in fashion e-commerces, products are represented by multiple images where a person wears the product along with other fashion items. If all the fashion items in the images are marked with bounding boxes, we can use the textual metadata to decide which item is the main product. The initial work treated each of these images independently, discarding the fact that they all belong to the same product. In this thesis, we represent the bounding boxes from all the images as nodes in a fully connected graph. This allows the algorithm to learn relations between the nodes during training and take the entire context into account for the final decision. Our algorithm results in a significant improvement of the state-ofthe-art.
Moreover, we address the problem of color naming for multicolored fashion items, which is a challenging task due to the external factors such as illumination changes or objects that act as clutter. In the context of multi-label classification, the vaguely defined lines between the classes in the color space cause ambiguity. For example, a shade of blue which is very close to green might cause the model to incorrectly predict the color blue and green at the same time. Based on this, models trained for color naming are expected to recognize the colors and their quantities in both single colored and multicolored fashion items. Therefore, in this thesis, we propose a novel architecture with an additional head that explicitly estimates the number of colors in fashion items. This removes the ambiguity problem and results in better color naming performance.
 
  Address January 2022  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Arnau Ramisa  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-122714-6-1 Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ Ogu2022 Serial 3631  
Permanent link to this record
 

 
Author Hugo Bertiche; Meysam Madadi; Sergio Escalera edit   pdf
openurl 
  Title PBNS: Physically Based Neural Simulation for Unsupervised Garment Pose Space Deformation Type Conference Article
  Year 2021 Publication 14th ACM Siggraph Conference and exhibition on Computer Graphics and Interactive Techniques in Asia Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords  
  Abstract We present a methodology to automatically obtain Pose Space Deformation (PSD) basis for rigged garments through deep learning. Classical approaches rely on Physically Based Simulations (PBS) to animate clothes. These are general solutions that, given a sufficiently fine-grained discretization of space and time, can achieve highly realistic results. However, they are computationally expensive and any scene modification prompts the need of re-simulation. Linear Blend Skinning (LBS) with PSD offers a lightweight alternative to PBS, though, it needs huge volumes of data to learn proper PSD. We propose using deep learning, formulated as an implicit PBS, to unsupervisedly learn realistic cloth Pose Space Deformations in a constrained scenario: dressed humans. Furthermore, we show it is possible to train these models in an amount of time comparable to a PBS of a few sequences. To the best of our knowledge, we are the first to propose a neural simulator for cloth.
While deep-based approaches in the domain are becoming a trend, these are data-hungry models. Moreover, authors often propose complex formulations to better learn wrinkles from PBS data. Supervised learning leads to physically inconsistent predictions that require collision solving to be used. Also, dependency on PBS data limits the scalability of these solutions, while their formulation hinders its applicability and compatibility. By proposing an unsupervised methodology to learn PSD for LBS models (3D animation standard), we overcome both of these drawbacks. Results obtained show cloth-consistency in the animated garments and meaningful pose-dependant folds and wrinkles. Our solution is extremely efficient, handles multiple layers of cloth, allows unsupervised outfit resizing and can be easily applied to any custom 3D avatar.
 
  Address Virtual; December 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SIGGRAPH  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ BME2021b Serial 3641  
Permanent link to this record
 

 
Author Swathikiran Sudhakaran; Sergio Escalera;Oswald Lanz edit   pdf
url  doi
openurl 
  Title Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries Type Journal Article
  Year 2021 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume Issue Pages (down)  
  Keywords  
  Abstract We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ SEL2021 Serial 3656  
Permanent link to this record
 

 
Author Pau Riba; Sounak Dey; Ali Furkan Biten; Josep Llados edit   pdf
openurl 
  Title Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild Type Miscellaneous
  Year 2021 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords  
  Abstract This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the performance of SGOL approaches based on available simple yet powerful methods. We advance prior arts by proposing a sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a hard classification and alleviates the domain gap between sketches and images to localize object instances. Although the main goal of SGOL is focused on object detection, we explored its natural extension to sketch-guided instance segmentation. This novel task allows to move towards identifying the objects at pixel level, which is of key importance in several applications. We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results. All training and testing code of our model will be released to facilitate future researchhttps://github.com/priba/sgol_wild.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ RDB2021 Serial 3674  
Permanent link to this record
 

 
Author Josep Llados edit  openurl
  Title The 5G of Document Intelligence Type Conference Article
  Year 2021 Publication 3rd Workshop on Future of Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords  
  Abstract  
  Address Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference FDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ Serial 3677  
Permanent link to this record
 

 
Author Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz; Shangling Jui edit   pdf
url  openurl
  Title Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Type Conference Article
  Year 2021 Publication Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords  
  Abstract Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g. due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might no longer align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors, and propose a self regularization loss to decrease the negative impact of noisy neighbors. Furthermore, to aggregate information with more context, we consider expanded neighborhoods with small affinity values. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets. Code is available in https://github.com/Albert0147/SFDA_neighbors.  
  Address Online; December 7-10, 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference NIPS  
  Notes LAMP; 600.147; 600.141 Approved no  
  Call Number Admin @ si @ Serial 3691  
Permanent link to this record
 

 
Author Wenjuan Gong; Zhang Yue; Wei Wang; Cheng Peng; Jordi Gonzalez edit  doi
openurl 
  Title Meta-MMFNet: Meta-Learning Based Multi-Model Fusion Network for Micro-Expression Recognition Type Journal Article
  Year 2022 Publication ACM Transactions on Multimedia Computing, Communications, and Applications Abbreviated Journal ACMTMC  
  Volume Issue Pages (down)  
  Keywords Feature Fusion; Model Fusion; Meta-Learning; Micro-Expression Recognition  
  Abstract Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method.  
  Address May 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE; 600.157 Approved no  
  Call Number Admin @ si @ GYW2022 Serial 3692  
Permanent link to this record
 

 
Author Mohamed Ramzy Ibrahim; Robert Benavente; Felipe Lumbreras; Daniel Ponsa edit   pdf
doi  openurl
  Title 3DRRDB: Super Resolution of Multiple Remote Sensing Images using 3D Residual in Residual Dense Blocks Type Conference Article
  Year 2022 Publication CVPR 2022 Workshop on IEEE Perception Beyond the Visible Spectrum workshop series (PBVS, 18th Edition) Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords Training; Solid modeling; Three-dimensional displays; PSNR; Convolution; Superresolution; Pattern recognition  
  Abstract The rapid advancement of Deep Convolutional Neural Networks helped in solving many remote sensing problems, especially the problems of super-resolution. However, most state-of-the-art methods focus more on Single Image Super-Resolution neglecting Multi-Image Super-Resolution. In this work, a new proposed 3D Residual in Residual Dense Blocks model (3DRRDB) focuses on remote sensing Multi-Image Super-Resolution for two different single spectral bands. The proposed 3DRRDB model explores the idea of 3D convolution layers in deeply connected Dense Blocks and the effect of local and global residual connections with residual scaling in Multi-Image Super-Resolution. The model tested on the Proba-V challenge dataset shows a significant improvement above the current state-of-the-art models scoring a Corrected Peak Signal to Noise Ratio (cPSNR) of 48.79 dB and 50.83 dB for Near Infrared (NIR) and RED Bands respectively. Moreover, the proposed 3DRRDB model scores a Corrected Structural Similarity Index Measure (cSSIM) of 0.9865 and 0.9909 for NIR and RED bands respectively.  
  Address New Orleans, USA; 19 June 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU; 600.130 Approved no  
  Call Number Admin @ si @ IBL2022 Serial 3693  
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Vassilis Athitsos; Mohammad Sabokrou edit   pdf
doi  openurl
  Title All You Need In Sign Language Production Type Miscellaneous
  Year 2022 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords Sign Language Production; Sign Language Recog- nition; Sign Language Translation; Deep Learning; Survey; Deaf  
  Abstract Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental.
To this end, sign language recognition and production are two necessary parts for making such a two-way system. Signlanguage recognition and production need to cope with some critical challenges. In this survey, we review recent advances in
Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ RKE2022c Serial 3698  
Permanent link to this record
 

 
Author Bojana Gajic; Ariel Amato; Ramon Baldrich; Joost Van de Weijer; Carlo Gatta edit   pdf
doi  openurl
  Title Area Under the ROC Curve Maximization for Metric Learning Type Conference Article
  Year 2022 Publication CVPR 2022 Workshop on Efficien Deep Learning for Computer Vision (ECV 2022, 5th Edition) Abbreviated Journal  
  Volume Issue Pages (down)  
  Keywords Training; Computer vision; Conferences; Area measurement; Benchmark testing; Pattern recognition  
  Abstract Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.  
  Address New Orleans, USA; 20 June 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes CIC; LAMP; Approved no  
  Call Number Admin @ si @ GAB2022 Serial 3700  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: