|   | 
Details
   web
Records
Author (up) Lichao Zhang; Abel Gonzalez-Garcia; Joost Van de Weijer; Martin Danelljan; Fahad Shahbaz Khan
Title Learning the Model Update for Siamese Trackers Type Conference Article
Year 2019 Publication 18th IEEE International Conference on Computer Vision Abbreviated Journal
Volume Issue Pages 4009-4018
Keywords
Abstract Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score.
Address Seul; Corea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV
Notes LAMP; 600.109; 600.141; 600.120 Approved no
Call Number Admin @ si @ ZGW2019 Serial 3295
Permanent link to this record
 

 
Author (up) Lichao Zhang; Martin Danelljan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan
Title Multi-Modal Fusion for End-to-End RGB-T Tracking Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 2252-2261
Keywords
Abstract We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.
Address Seul; Corea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes LAMP; 600.109; 600.141; 600.120 Approved no
Call Number Admin @ si @ ZDG2019 Serial 3279
Permanent link to this record
 

 
Author (up) Liu Wenyin; Josep Llados; Jean-Marc Ogier
Title Graphics Recognition. Recent Advances and New Opportunities. Type Book Whole
Year 2008 Publication 7th International Workshop, Selected Papers, Abbreviated Journal
Volume 5046 Issue Pages
Keywords
Abstract
Address Curitiba (Brazil)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-540-88184-1 Medium
Area Expedition Conference GREC
Notes DAG Approved no
Call Number DAG @ dag @ WLO2008 Serial 1012
Permanent link to this record
 

 
Author (up) Lluis Barcelo
Title Accurate video mosaicing with moving objects Type Report
Year 2002 Publication CVC Technical Report # 59 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address CVC (UAB)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Bar2002 Serial 326
Permanent link to this record
 

 
Author (up) Lluis Barcelo; X. Binefa
Title Bayesian Video Mosaicing with Moving Objects. Type Miscellaneous
Year 2001 Publication Proceedings of the IX Spanish Symposium on Pattern Recognition and Image Analysis, 1:91–96. Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ BaB2001 Serial 72
Permanent link to this record
 

 
Author (up) Lluis Barcelo; X. Binefa
Title Bayesian Video Mosaicing with moving objects Type Journal
Year 2002 Publication International Journal of Pattern Recognition and Artificial Intelligence, 16(3): 341–348 (IF: 0.359) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ BaB2002 Serial 268
Permanent link to this record
 

 
Author (up) Lluis Garrido; M.Guerrieri; Laura Igual
Title Image Segmentation with Cage Active Contours Type Journal Article
Year 2015 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 24 Issue 12 Pages 5557 - 5566
Keywords Level sets; Mean value coordinates; Parametrized active contours; level sets; mean value coordinates
Abstract In this paper, we present a framework for image segmentation based on parametrized active contours. The evolving contour is parametrized according to a reduced set of control points that form a closed polygon and have a clear visual interpretation. The parametrization, called mean value coordinates, stems from the techniques used in computer graphics to animate virtual models. Our framework allows to easily formulate region-based energies to segment an image. In particular, we present three different local region-based energy terms: 1) the mean model; 2) the Gaussian model; 3) and the histogram model. We show the behavior of our method on synthetic and real images and compare the performance with state-of-the-art level set methods.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1057-7149 ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ GGI2015 Serial 2673
Permanent link to this record
 

 
Author (up) Lluis Gomez
Title Perceptual Organization for Text Extraction in Natural Scenes Type Report
Year 2012 Publication CVC Technical Report Abbreviated Journal
Volume 173 Issue Pages
Keywords
Abstract
Address Bellaterra
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ Gom2012 Serial 2309
Permanent link to this record
 

 
Author (up) Lluis Gomez
Title Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding Type Book Whole
Year 2016 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes.
For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any
language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision.
Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which
text emerges as a perceptually significant group of atomic objects.
In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods.
Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve
the overall tracking performance of the system.
Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key
characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Dimosthenis Karatzas
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ Gom2016 Serial 2891
Permanent link to this record
 

 
Author (up) Lluis Gomez; Ali Furkan Biten; Ruben Tito; Andres Mafla; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title Multimodal grid features and cell pointers for scene text visual question answering Type Journal Article
Year 2021 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 150 Issue Pages 242-249
Keywords
Abstract This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.084; 600.121 Approved no
Call Number Admin @ si @ GBT2021 Serial 3620
Permanent link to this record
 

 
Author (up) Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas
Title Single Shot Scene Text Retrieval Type Conference Article
Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal
Volume 11218 Issue Pages 728-744
Keywords Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC
Abstract Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture
outperforms previous state-of-the-art while it offers a significant increase
in processing speed.
Address Munich; September 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCV
Notes DAG; 600.084; 601.338; 600.121; 600.129 Approved no
Call Number Admin @ si @ GMR2018 Serial 3143
Permanent link to this record
 

 
Author (up) Lluis Gomez; Anguelos Nicolaou; Dimosthenis Karatzas
Title Improving patch‐based scene text script identification with ensembles of conjoined networks Type Journal Article
Year 2017 Publication Pattern Recognition Abbreviated Journal PR
Volume 67 Issue Pages 85-96
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.084; 600.121; 600.129 Approved no
Call Number Admin @ si @ GNK2017 Serial 2887
Permanent link to this record
 

 
Author (up) Lluis Gomez; Anguelos Nicolaou; Marçal Rusiñol; Dimosthenis Karatzas
Title 12 years of ICDAR Robust Reading Competitions: The evolution of reading systems for unconstrained text understanding Type Book Chapter
Year 2020 Publication Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor K. Alahari; C.V. Jawahar
Language Summary Language Original Title
Series Editor Series Title Series on Advances in Computer Vision and Pattern Recognition Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number GNR2020 Serial 3494
Permanent link to this record
 

 
Author (up) Lluis Gomez; Dena Bazazian; Dimosthenis Karatzas
Title Historical review of scene text detection research Type Book Chapter
Year 2020 Publication Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor K. Alahari; C.V. Jawahar
Language Summary Language Original Title
Series Editor Series Title Series on Advances in Computer Vision and Pattern Recognition Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ GBK2020 Serial 3495
Permanent link to this record
 

 
Author (up) Lluis Gomez; Dimosthenis Karatzas
Title Multi-script Text Extraction from Natural Scenes Type Conference Article
Year 2013 Publication 12th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 467-471
Keywords
Abstract Scene text extraction methodologies are usually based in classification of individual regions or patches, using a priori knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organisation through which text emerges as a perceptually significant group of atomic objects. Therefore humans are able to detect text even in languages and scripts never seen before. In this paper, we argue that the text extraction problem could be posed as the detection of meaningful groups of regions. We present a method built around a perceptual organisation framework that exploits collaboration of proximity and similarity laws to create text-group hypotheses. Experiments demonstrate that our algorithm is competitive with state of the art approaches on a standard dataset covering text in variable orientations and two languages.
Address Washington; USA; August 2013
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1520-5363 ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.056; 601.158; 601.197 Approved no
Call Number Admin @ si @ GoK2013 Serial 2310
Permanent link to this record