toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Hugo Bertiche; Meysam Madadi; Sergio Escalera edit   pdf
url  doi
openurl 
  Title Deep Parametric Surfaces for 3D Outfit Reconstruction from Single View Image Type Conference Article
  Year 2021 Publication 16th IEEE International Conference on Automatic Face and Gesture Recognition Abbreviated Journal  
  Volume Issue Pages 1-8  
  Keywords  
  Abstract (up) We present a methodology to retrieve analytical surfaces parametrized as a neural network. Previous works on 3D reconstruction yield point clouds, voxelized objects or meshes. Instead, our approach yields 2-manifolds in the euclidean space through deep learning. To this end, we implement a novel formulation for fully connected layers as parametrized manifolds that allows continuous predictions with differential geometry. Based on this property we propose a novel smoothness loss. Results on CLOTH3D++ dataset show the possibility to infer different topologies and the benefits of the smoothness term based on differential geometry.  
  Address Virtual; December 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference FG  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ BME2021 Serial 3640  
Permanent link to this record
 

 
Author Minesh Mathew; Dimosthenis Karatzas; C.V. Jawahar edit   pdf
openurl 
  Title DocVQA: A Dataset for VQA on Document Images Type Conference Article
  Year 2021 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 2200-2209  
  Keywords  
  Abstract (up) We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa. org  
  Address Virtual; January 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ MKJ2021 Serial 3498  
Permanent link to this record
 

 
Author Hugo Bertiche; Meysam Madadi; Emilio Tylson; Sergio Escalera edit   pdf
url  openurl
  Title DeePSD: Automatic Deep Skinning And Pose Space Deformation For 3D Garment Animation Type Conference Article
  Year 2021 Publication 19th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages 5471-5480  
  Keywords  
  Abstract (up) We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions that suffer from scalability, applicability and compatibility. By limiting our scope to garment animation only, we are able to propose a simple model that can animate any outfit, independently of its topology, vertex order or connectivity. Our proposed architecture maps outfits to animated 3D models into the standard format for 3D animation (blend weights and blend shapes matrices), automatically providing of compatibility with any graphics engine. We also propose a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.  
  Address Virtual; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCV  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ BMT2021 Serial 3606  
Permanent link to this record
 

 
Author Reza Azad; Afshin Bozorgpour; Maryam Asadi-Aghbolaghi; Dorit Merhof; Sergio Escalera edit   pdf
openurl 
  Title Deep Frequency Re-Calibration U-Net for Medical Image Segmentation Type Conference Article
  Year 2021 Publication IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 3274-3283  
  Keywords  
  Abstract (up) We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions that suffer from scalability, applicability and compatibility. By limiting our scope to garment animation only, we are able to propose a simple model that can animate any outfit, independently of its topology, vertex order or connectivity. Our proposed architecture maps outfits to animated 3D models into the standard format for 3D animation (blend weights and blend shapes matrices), automatically providing of compatibility with any graphics engine. We also propose a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.  
  Address VIRTUAL; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ ABA2021 Serial 3645  
Permanent link to this record
 

 
Author Swathikiran Sudhakaran; Sergio Escalera;Oswald Lanz edit   pdf
url  doi
openurl 
  Title Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries Type Journal Article
  Year 2021 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume Issue Pages  
  Keywords  
  Abstract (up) We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ SEL2021 Serial 3656  
Permanent link to this record
 

 
Author Marc Masana; Tinne Tuytelaars; Joost Van de Weijer edit   pdf
doi  openurl
  Title Ternary Feature Masks: zero-forgetting for task-incremental learning Type Conference Article
  Year 2021 Publication 34th IEEE Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages 3565-3574  
  Keywords  
  Abstract (up) We propose an approach without any forgetting to continual learning for the task-aware regime, where at inference the task-label is known. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Using masks prevents both catastrophic forgetting and backward transfer. We argue -- and show experimentally -- that avoiding the former largely compensates for the lack of the latter, which is rarely observed in practice. In contrast to earlier works, our masks are applied to the features (activations) of each layer instead of the weights. This considerably reduces the number of mask parameters for each new task; with more than three orders of magnitude for most networks. The encoding of the ternary masks into two bits per feature creates very little overhead to the network, avoiding scalability issues. To allow already learned features to adapt to the current task without changing the behavior of these features for previous tasks, we introduce task-specific feature normalization. Extensive experiments on several finegrained datasets and ImageNet show that our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.  
  Address Virtual; June 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ MTW2021 Serial 3565  
Permanent link to this record
 

 
Author David Aldavert edit  isbn
openurl 
  Title Efficient and Scalable Handwritten Word Spotting on Historical Documents using Bag of Visual Words Type Book Whole
  Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (up) Word spotting can be defined as the pattern recognition tasked aimed at locating and retrieving a specific keyword within a document image collection without explicitly transcribing the whole corpus. Its use is particularly interesting when applied in scenarios where Optical Character Recognition performs poorly or can not be used at all. This thesis focuses on such a scenario, word spotting on historical handwritten documents that have been written by a single author or by multiple authors with a similar calligraphy.
This problem requires a visual signature that is robust to image artifacts, flexible to accommodate script variations and efficient to retrieve information in a rapid manner. For this, we have developed a set of word spotting methods that on their foundation use the well known Bag-of-Visual-Words (BoVW) representation. This representation has gained popularity among the document image analysis community to characterize handwritten words
in an unsupervised manner. However, most approaches on this field rely on a basic BoVW configuration and disregard complex encoding and spatial representations. We determine which BoVW configurations provide the best performance boost to a spotting system.
Then, we extend the segmentation-based word spotting, where word candidates are given a priori, to segmentation-free spotting. The proposed approach seeds the document images with overlapping word location candidates and characterizes them with a BoVW signature. Retrieval is achieved comparing the query and candidate signatures and returning the locations that provide a higher consensus. This is a simple but powerful approach that requires a more compact signature than in a segmentation-based scenario. We first
project the BoVW signature into a reduced semantic topics space and then compress it further using Product Quantizers. The resulting signature only requires a few dozen bytes, allowing us to index thousands of pages on a common desktop computer. The final system still yields a performance comparable to the state-of-the-art despite all the information loss during the compression phases.
Afterwards, we also study how to combine different modalities of information in order to create a query-by-X spotting system where, words are indexed using an information modality and queries are retrieved using another. We consider three different information modalities: visual, textual and audio. Our proposal is to create a latent feature space where features which are semantically related are projected onto the same topics. Creating thus a new feature space where information from different modalities can be compared. Later, we consider the codebook generation and descriptor encoding problem. The codebooks used to encode the BoVW signatures are usually created using an unsupervised clustering algorithm and, they require to test multiple parameters to determine which configuration is best for a certain document collection. We propose a semantic clustering algorithm which allows to estimate the best parameter from data. Since gather annotated data is costly, we use synthetically generated word images. The resulting codebook is database agnostic, i. e. a codebook that yields a good performance on document collections that use the same script. We also propose the use of an additional codebook to approximate descriptors and reduce the descriptor encoding
complexity to sub-linear.
Finally, we focus on the problem of signatures dimensionality. We propose a new symbol probability signature where each bin represents the probability that a certain symbol is present a certain location of the word image. This signature is extremely compact and combined with compression techniques can represent word images with just a few bytes per signature.
 
  Address April 2021  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Marçal Rusiñol;Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-122714-5-4 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ Ald2021 Serial 3601  
Permanent link to this record
 

 
Author Shiqi Yang; Kai Wang; Luis Herranz; Joost Van de Weijer edit   pdf
url  doi
openurl 
  Title On Implicit Attribute Localization for Generalized Zero-Shot Learning Type Journal Article
  Year 2021 Publication IEEE Signal Processing Letters Abbreviated Journal  
  Volume 28 Issue Pages 872 - 876  
  Keywords  
  Abstract (up) Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their attribute-based descriptions. Since attributes are often related to specific parts of objects, many recent works focus on discovering discriminative regions. However, these methods usually require additional complex part detection modules or attention mechanisms. In this paper, 1) we show that common ZSL backbones (without explicit attention nor part detection) can implicitly localize attributes, yet this property is not exploited. 2) Exploiting it, we then propose SELAR, a simple method that further encourages attribute localization, surprisingly achieving very competitive generalized ZSL (GZSL) performance when compared with more complex state-of-the-art methods. Our findings provide useful insight for designing future GZSL methods, and SELAR provides an easy to implement yet strong baseline.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number YWH2021 Serial 3563  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: