Home | [121–130] << 131 132 133 134 135 136 137 138 139 140 >> [141–150] |
Records | |||||
---|---|---|---|---|---|
Author | Lichao Zhang; Abel Gonzalez-Garcia; Joost Van de Weijer; Martin Danelljan; Fahad Shahbaz Khan | ||||
Title | Learning the Model Update for Siamese Trackers | Type | Conference Article | ||
Year | 2019 | Publication | 18th IEEE International Conference on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 4009-4018 | ||
Keywords | |||||
Abstract | Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score. | ||||
Address | Seul; Corea; October 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCV | ||
Notes | LAMP; 600.109; 600.141; 600.120 | Approved | no | ||
Call Number | Admin @ si @ ZGW2019 | Serial | 3295 | ||
Permanent link to this record | |||||
Author | Lichao Zhang; Martin Danelljan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan | ||||
Title | Multi-Modal Fusion for End-to-End RGB-T Tracking | Type | Conference Article | ||
Year | 2019 | Publication | IEEE International Conference on Computer Vision Workshops | Abbreviated Journal | |
Volume | Issue | Pages | 2252-2261 | ||
Keywords | |||||
Abstract | We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset. | ||||
Address | Seul; Corea; October 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCVW | ||
Notes | LAMP; 600.109; 600.141; 600.120 | Approved | no | ||
Call Number | Admin @ si @ ZDG2019 | Serial | 3279 | ||
Permanent link to this record | |||||
Author | Liu Wenyin; Josep Llados; Jean-Marc Ogier | ||||
Title | Graphics Recognition. Recent Advances and New Opportunities. | Type | Book Whole | ||
Year | 2008 | Publication | 7th International Workshop, Selected Papers, | Abbreviated Journal | |
Volume | 5046 | Issue | Pages | ||
Keywords | |||||
Abstract | |||||
Address | Curitiba (Brazil) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-540-88184-1 | Medium | ||
Area | Expedition | Conference | GREC | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ WLO2008 | Serial | 1012 | ||
Permanent link to this record | |||||
Author | Lluis Barcelo | ||||
Title | Accurate video mosaicing with moving objects | Type | Report | ||
Year | 2002 | Publication | CVC Technical Report # 59 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | CVC (UAB) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ Bar2002 | Serial | 326 | ||
Permanent link to this record | |||||
Author | Lluis Barcelo; X. Binefa | ||||
Title | Bayesian Video Mosaicing with Moving Objects. | Type | Miscellaneous | ||
Year | 2001 | Publication | Proceedings of the IX Spanish Symposium on Pattern Recognition and Image Analysis, 1:91–96. | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ BaB2001 | Serial | 72 | ||
Permanent link to this record | |||||
Author | Lluis Barcelo; X. Binefa | ||||
Title | Bayesian Video Mosaicing with moving objects | Type | Journal | ||
Year | 2002 | Publication | International Journal of Pattern Recognition and Artificial Intelligence, 16(3): 341–348 (IF: 0.359) | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ BaB2002 | Serial | 268 | ||
Permanent link to this record | |||||
Author | Lluis Garrido; M.Guerrieri; Laura Igual | ||||
Title | Image Segmentation with Cage Active Contours | Type | Journal Article | ||
Year | 2015 | Publication | IEEE Transactions on Image Processing | Abbreviated Journal | TIP |
Volume | 24 | Issue | 12 | Pages | 5557 - 5566 |
Keywords | Level sets; Mean value coordinates; Parametrized active contours; level sets; mean value coordinates | ||||
Abstract | In this paper, we present a framework for image segmentation based on parametrized active contours. The evolving contour is parametrized according to a reduced set of control points that form a closed polygon and have a clear visual interpretation. The parametrization, called mean value coordinates, stems from the techniques used in computer graphics to animate virtual models. Our framework allows to easily formulate region-based energies to segment an image. In particular, we present three different local region-based energy terms: 1) the mean model; 2) the Gaussian model; 3) and the histogram model. We show the behavior of our method on synthetic and real images and compare the performance with state-of-the-art level set methods. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1057-7149 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ GGI2015 | Serial | 2673 | ||
Permanent link to this record | |||||
Author | Lluis Gomez | ||||
Title | Perceptual Organization for Text Extraction in Natural Scenes | Type | Report | ||
Year | 2012 | Publication | CVC Technical Report | Abbreviated Journal | |
Volume | 173 | Issue | Pages | ||
Keywords | |||||
Abstract | |||||
Address | Bellaterra | ||||
Corporate Author | Thesis | Master's thesis | |||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ Gom2012 | Serial | 2309 | ||
Permanent link to this record | |||||
Author | Lluis Gomez | ||||
Title | Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding | Type | Book Whole | ||
Year | 2016 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes.
For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision. Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which text emerges as a perceptually significant group of atomic objects. In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods. Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve the overall tracking performance of the system. Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Place of Publication | Editor | Dimosthenis Karatzas | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ Gom2016 | Serial | 2891 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Ali Furkan Biten; Ruben Tito; Andres Mafla; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | Multimodal grid features and cell pointers for scene text visual question answering | Type | Journal Article | ||
Year | 2021 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 150 | Issue | Pages | 242-249 | |
Keywords | |||||
Abstract | This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.084; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GBT2021 | Serial | 3620 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas | ||||
Title | Single Shot Scene Text Retrieval | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11218 | Issue | Pages | 728-744 | |
Keywords | Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC | ||||
Abstract | Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed. |
||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | DAG; 600.084; 601.338; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GMR2018 | Serial | 3143 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Anguelos Nicolaou; Dimosthenis Karatzas | ||||
Title | Improving patch‐based scene text script identification with ensembles of conjoined networks | Type | Journal Article | ||
Year | 2017 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 67 | Issue | Pages | 85-96 | |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.084; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GNK2017 | Serial | 2887 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Anguelos Nicolaou; Marçal Rusiñol; Dimosthenis Karatzas | ||||
Title | 12 years of ICDAR Robust Reading Competitions: The evolution of reading systems for unconstrained text understanding | Type | Book Chapter | ||
Year | 2020 | Publication | Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | K. Alahari; C.V. Jawahar | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Series on Advances in Computer Vision and Pattern Recognition | Abbreviated Series Title | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | GNR2020 | Serial | 3494 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Dena Bazazian; Dimosthenis Karatzas | ||||
Title | Historical review of scene text detection research | Type | Book Chapter | ||
Year | 2020 | Publication | Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | K. Alahari; C.V. Jawahar | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Series on Advances in Computer Vision and Pattern Recognition | Abbreviated Series Title | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GBK2020 | Serial | 3495 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Multi-script Text Extraction from Natural Scenes | Type | Conference Article | ||
Year | 2013 | Publication | 12th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 467-471 | ||
Keywords | |||||
Abstract | Scene text extraction methodologies are usually based in classification of individual regions or patches, using a priori knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organisation through which text emerges as a perceptually significant group of atomic objects. Therefore humans are able to detect text even in languages and scripts never seen before. In this paper, we argue that the text extraction problem could be posed as the detection of meaningful groups of regions. We present a method built around a perceptual organisation framework that exploits collaboration of proximity and similarity laws to create text-group hypotheses. Experiments demonstrate that our algorithm is competitive with state of the art approaches on a standard dataset covering text in variable orientations and two languages. | ||||
Address | Washington; USA; August 2013 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1520-5363 | ISBN | Medium | ||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.056; 601.158; 601.197 | Approved | no | ||
Call Number | Admin @ si @ GoK2013 | Serial | 2310 | ||
Permanent link to this record |