|   | 
Details
   web
Records
Author (up) Pau Rodriguez; Josep M. Gonfaus; Guillem Cucurull; Xavier Roca; Jordi Gonzalez
Title Attend and Rectify: A Gated Attention Mechanism for Fine-Grained Recovery Type Conference Article
Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal
Volume 11212 Issue Pages 357-372
Keywords Deep Learning; Convolutional Neural Networks; Attention
Abstract We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100.
Address Munich; September 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCV
Notes ISE; 600.098; 602.121; 600.119 Approved no
Call Number Admin @ si @ RGC2018 Serial 3139
Permanent link to this record
 

 
Author (up) Pau Rodriguez; Miguel Angel Bautista; Sergio Escalera; Jordi Gonzalez
Title Beyond Oneshot Encoding: lower dimensional target embedding Type Journal Article
Year 2018 Publication Image and Vision Computing Abbreviated Journal IMAVIS
Volume 75 Issue Pages 21-31
Keywords Error correcting output codes; Output embeddings; Deep learning; Computer vision
Abstract Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, one-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space, but instead lies in a low-dimensional output manifold. Following this observation, we embed the targets into a low-dimensional space, drastically improving convergence speed while preserving accuracy. Our contribution is two fold: (i) We show that random projections of the label space are a valid tool to find such lower dimensional embeddings, boosting dramatically convergence rates at zero computational cost; and (ii) we propose a normalized eigenrepresentation of the class manifold that encodes the targets with minimal information loss, improving the accuracy of random projections encoding while enjoying the same convergence rates. Experiments on CIFAR-100, CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach drastically improves convergence speed while reaching very competitive accuracy rates.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; HuPBA; 600.098; 602.133; 602.121; 600.119 Approved no
Call Number Admin @ si @ RBE2018 Serial 3120
Permanent link to this record
 

 
Author (up) Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang
Title Improving Handwritten Music Recognition through Language Model Integration Type Conference Article
Year 2022 Publication 4th International Workshop on Reading Music Systems (WoRMS2022) Abbreviated Journal
Volume Issue Pages 42-46
Keywords optical music recognition; historical sources; diversity; music theory; digital humanities
Abstract Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature.
Address November 18, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WoRMS
Notes DAG; 600.121; 600.162; 602.230 Approved no
Call Number Admin @ si @ TBF2022 Serial 3735
Permanent link to this record
 

 
Author (up) Pau Torras; Arnau Baro; Lei Kang; Alicia Fornes
Title On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition Type Conference Article
Year 2021 Publication International Society for Music Information Retrieval Conference Abbreviated Journal
Volume Issue Pages 690-696
Keywords
Abstract Despite the latest advances in Deep Learning, the recognition of handwritten music scores is still a challenging endeavour. Even though the recent Sequence to Sequence(Seq2Seq) architectures have demonstrated its capacity to reliably recognise handwritten text, their performance is still far from satisfactory when applied to historical handwritten scores. Indeed, the ambiguous nature of handwriting, the non-standard musical notation employed by composers of the time and the decaying state of old paper make these scores remarkably difficult to read, sometimes even by trained humans. Thus, in this work we explore the incorporation of language models into a Seq2Seq-based architecture to try to improve transcriptions where the aforementioned unclear writing produces statistically unsound mistakes, which as far as we know, has never been attempted for this field of research on this architecture. After studying various Language Model integration techniques, the experimental evaluation on historical handwritten music scores shows a significant improvement over the state of the art, showing that this is a promising research direction for dealing with such difficult manuscripts.
Address Virtual; November 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ISMIR
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ TBK2021 Serial 3616
Permanent link to this record
 

 
Author (up) Pau Torras; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes
Title A Transcription Is All You Need: Learning to Align through Attention Type Conference Article
Year 2021 Publication 14th IAPR International Workshop on Graphics Recognition Abbreviated Journal
Volume 12916 Issue Pages 141–146
Keywords
Abstract Historical ciphered manuscripts are a type of document where graphical symbols are used to encrypt their content instead of regular text. Nowadays, expert transcriptions can be found in libraries alongside the corresponding manuscript images. However, those transcriptions are not aligned, so these are barely usable for training deep learning-based recognition methods. To solve this issue, we propose a method to align each symbol in the transcript of an image with its visual representation by using an attention-based Sequence to Sequence (Seq2Seq) model. The core idea is that, by learning to recognise symbols sequence within a cipher line image, the model also identifies their position implicitly through an attention mechanism. Thus, the resulting symbol segmentation can be later used for training algorithms. The experimental evaluation shows that this method is promising, especially taking into account the small size of the cipher dataset.
Address Virtual; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference GREC
Notes DAG; 602.230; 600.140; 600.121 Approved no
Call Number Admin @ si @ TSC2021 Serial 3619
Permanent link to this record
 

 
Author (up) Pau Torras; Mohamed Ali Souibgui; Sanket Biswas; Alicia Fornes
Title Segmentation-Free Alignment of Arbitrary Symbol Transcripts to Images Type Conference Article
Year 2023 Publication Document Analysis and Recognition – ICDAR 2023 Workshops Abbreviated Journal
Volume 14193 Issue Pages 83-93
Keywords Historical Manuscripts; Symbol Alignment
Abstract Developing arbitrary symbol recognition systems is a challenging endeavour. Even using content-agnostic architectures such as few-shot models, performance can be substantially improved by providing a number of well-annotated examples into training. In some contexts, transcripts of the symbols are available without any position information associated to them, which enables using line-level recognition architectures. A way of providing this position information to detection-based architectures is finding systems that can align the input symbols with the transcription. In this paper we discuss some symbol alignment techniques that are suitable for low-data scenarios and provide an insight on their perceived strengths and weaknesses. In particular, we study the usage of Connectionist Temporal Classification models, Attention-Based Sequence to Sequence models and we compare them with the results obtained on a few-shot recognition system.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ TSS2023 Serial 3850
Permanent link to this record
 

 
Author (up) Paula Fritzsche; C.Roig; Ana Ripoll; Emilio Luque; Aura Hernandez-Sabate
Title A Performance Prediction Methodology for Data-dependent Parallel Applications Type Conference Article
Year 2006 Publication Proceedings of the IEEE International Conference on Cluster Computing Abbreviated Journal
Volume Issue Pages 1-8
Keywords
Abstract The increase in the use of parallel distributed architectures in order to solve large-scale scientific problems has generated the need for performance prediction for both deterministic applications and non-deterministic applications. In particular, the performance prediction of data dependent programs is an extremely challenging problem because for a specific issue the input datasets may cause different execution times. Generally, a parallel application is characterized as a collection of tasks and their interrelations. If the application is time-critical it is not enough to work with only one value per task, and consequently knowledge of the distribution of task execution times is crucial. The development of a new prediction methodology to estimate the performance of data-dependent parallel applications is the primary target of this study. This approach makes it possible to evaluate the parallel performance of an application without the need of implementation. A real data-dependent arterial structure detection application model is used to apply the methodology proposed. The predicted times obtained using the new methodology for genuine datasets are compared with predicted times that arise from using only one execution value per task. Finally, the experimental study shows that the new methodology generates more precise predictions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM Approved no
Call Number IAM @ iam @ FRR2006 Serial 1497
Permanent link to this record
 

 
Author (up) Pedro Herruzo; Marc Bolaños; Petia Radeva
Title Can a CNN Recognize Catalan Diet? Type Book Chapter
Year 2016 Publication AIP Conference Proceedings Abbreviated Journal
Volume 1773 Issue Pages
Keywords
Abstract CoRR abs/1607.08811
Nowadays, we can find several diseases related to the unhealthy diet habits of the population, such as diabetes, obesity, anemia, bulimia and anorexia. In many cases, these diseases are related to the food consumption of people. Mediterranean diet is scientifically known as a healthy diet that helps to prevent many metabolic diseases. In particular, our work focuses on the recognition of Mediterranean food and dishes. The development of this methodology would allow to analise the daily habits of users with wearable cameras, within the topic of lifelogging. By using automatic mechanisms we could build an objective tool for the analysis of the patient’s behavior, allowing specialists to discover unhealthy food patterns and understand the user’s lifestyle.
With the aim to automatically recognize a complete diet, we introduce a challenging multi-labeled dataset related to Mediter-ranean diet called FoodCAT. The first type of label provided consists of 115 food classes with an average of 400 images per dish, and the second one consists of 12 food categories with an average of 3800 pictures per class. This dataset will serve as a basis for the development of automatic diet recognition. In this context, deep learning and more specifically, Convolutional Neural Networks (CNNs), currently are state-of-the-art methods for automatic food recognition. In our work, we compare several architectures for image classification, with the purpose of diet recognition. Applying the best model for recognising food categories, we achieve a top-1 accuracy of 72.29%, and top-5 of 97.07%. In a complete diet recognition of dishes from Mediterranean diet, enlarged with the Food-101 dataset for international dishes recognition, we achieve a top-1 accuracy of 68.07%, and top-5 of 89.53%, for a total of 115+101 food classes.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ HBR2016 Serial 2837
Permanent link to this record
 

 
Author (up) Pedro Martins; Carlo Gatta; Paulo Carvalho
Title Feature-driven Maximally Stable Extremal Regions Type Conference Article
Year 2012 Publication 7th International Conference on Computer Vision Theory and Applications Abbreviated Journal
Volume Issue Pages 490-497
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISAPP
Notes MILAB Approved no
Call Number Admin @ si @ MGC2012 Serial 2139
Permanent link to this record
 

 
Author (up) Pedro Martins; Paulo Carvalho; Carlo Gatta
Title Context Aware Keypoint Extraction for Robust Image Representation Type Conference Article
Year 2012 Publication 23rd British Machine Vision Conference Abbreviated Journal
Volume Issue Pages 100.1 - 100.12
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference BMVC
Notes MILAB Approved no
Call Number Admin @ si @ MCG2012a Serial 2140
Permanent link to this record
 

 
Author (up) Pedro Martins; Paulo Carvalho; Carlo Gatta
Title Stable Salient Shapes Type Conference Article
Year 2012 Publication International Conference on Digital Image Computing: Techniques and Applications Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DICTA
Notes MILAB Approved no
Call Number Admin @ si @ MCG2012b Serial 2166
Permanent link to this record
 

 
Author (up) Pedro Martins; Paulo Carvalho; Carlo Gatta
Title Context-aware features and robust image representations Type Journal Article
Year 2014 Publication Journal of Visual Communication and Image Representation Abbreviated Journal JVCIR
Volume 25 Issue 2 Pages 339-348
Keywords
Abstract Local image features are often used to efficiently represent image content. The limited number of types of features that a local feature extractor responds to might be insufficient to provide a robust image representation. To overcome this limitation, we propose a context-aware feature extraction formulated under an information theoretic framework. The algorithm does not respond to a specific type of features; the idea is to retrieve complementary features which are relevant within the image context. We empirically validate the method by investigating the repeatability, the completeness, and the complementarity of context-aware features on standard benchmarks. In a comparison with strictly local features, we show that our context-aware features produce more robust image representations. Furthermore, we study the complementarity between strictly local features and context-aware ones to produce an even more robust representation.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.079;MILAB Approved no
Call Number Admin @ si @ MCG2014 Serial 2467
Permanent link to this record
 

 
Author (up) Pedro Martins; Paulo Carvalho; Carlo Gatta
Title On the completeness of feature-driven maximally stable extremal regions Type Journal Article
Year 2016 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 74 Issue Pages 9-16
Keywords Local features; Completeness; Maximally Stable Extremal Regions
Abstract By definition, local image features provide a compact representation of the image in which most of the image information is preserved. This capability offered by local features has been overlooked, despite being relevant in many application scenarios. In this paper, we analyze and discuss the performance of feature-driven Maximally Stable Extremal Regions (MSER) in terms of the coverage of informative image parts (completeness). This type of features results from an MSER extraction on saliency maps in which features related to objects boundaries or even symmetry axes are highlighted. These maps are intended to be suitable domains for MSER detection, allowing this detector to provide a better coverage of informative image parts. Our experimental results, which were based on a large-scale evaluation, show that feature-driven MSER have relatively high completeness values and provide more complete sets than a traditional MSER detection even when sets of similar cardinality are considered.
Address
Corporate Author Thesis
Publisher Elsevier B.V. Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0167-8655 ISBN Medium
Area Expedition Conference
Notes LAMP;MILAB; Approved no
Call Number Admin @ si @ MCG2016 Serial 2748
Permanent link to this record
 

 
Author (up) Pejman Rasti; Salma Samiei; Mary Agoyi; Sergio Escalera; Gholamreza Anbarjafari
Title Robust non-blind color video watermarking using QR decomposition and entropy analysis Type Journal Article
Year 2016 Publication Journal of Visual Communication and Image Representation Abbreviated Journal JVCIR
Volume 38 Issue Pages 838-847
Keywords Video watermarking; QR decomposition; Discrete Wavelet Transformation; Chirp Z-transform; Singular value decomposition; Orthogonal–triangular decomposition
Abstract Issues such as content identification, document and image security, audience measurement, ownership and copyright among others can be settled by the use of digital watermarking. Many recent video watermarking methods show drops in visual quality of the sequences. The present work addresses the aforementioned issue by introducing a robust and imperceptible non-blind color video frame watermarking algorithm. The method divides frames into moving and non-moving parts. The non-moving part of each color channel is processed separately using a block-based watermarking scheme. Blocks with an entropy lower than the average entropy of all blocks are subject to a further process for embedding the watermark image. Finally a watermarked frame is generated by adding moving parts to it. Several signal processing attacks are applied to each watermarked frame in order to perform experiments and are compared with some recent algorithms. Experimental results show that the proposed scheme is imperceptible and robust against common signal processing attacks.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @RSA2016 Serial 2766
Permanent link to this record
 

 
Author (up) Pejman Rasti; Tonis Uiboupin; Sergio Escalera; Gholamreza Anbarjafari
Title Convolutional Neural Network Super Resolution for Face Recognition in Surveillance Monitoring Type Conference Article
Year 2016 Publication 9th Conference on Articulated Motion and Deformable Objects Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Palma de Mallorca; Spain; July 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AMDO
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ RUE2016 Serial 2846
Permanent link to this record