toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Klara Janousckova; Jiri Matas; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Text Recognition – Real World Data and Where to Find Them Type Conference Article
  Year 2020 Publication 25th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages (down) 4489-4496  
  Keywords  
  Abstract We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as “pseudo ground truth” (PGT). The method is applied to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7% on average, across different benchmark datasets (image domains) and 24.5% on one of the weakly annotated datasets 1 1 Acknowledgements. The authors were supported by Czech Technical University student grant SGS20/171/0HK3/3TJ13, the MEYS VVV project CZ.02.1.01/0.010.0J16 019/0000765 Research Center for Informatics, the Spanish Research project TIN2017-89779-P and the CERCA Programme / Generalitat de Catalunya.  
  Address Virtual; January 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ JMG2020 Serial 3557  
Permanent link to this record
 

 
Author Joan Marti; Jose Miguel Benedi; Ana Maria Mendonça; Joan Serrat edit  openurl
  Title Pattern Recognition and Image Analysis Type Book Whole
  Year 2007 Publication 3rd Iberian Conference Abbreviated Journal  
  Volume 6669 Issue Pages (down) 4477-4478  
  Keywords  
  Abstract  
  Address Girona (Spain)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IbPRIA  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ MBM2007 Serial 994  
Permanent link to this record
 

 
Author Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli edit  url
doi  openurl
  Title Seeing and Hearing Egocentric Actions: How Much Can We Learn? Type Conference Article
  Year 2019 Publication IEEE International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages (down) 4470-4480  
  Keywords  
  Abstract Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.  
  Address Seul; Korea; October 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ CLR2019b Serial 3385  
Permanent link to this record
 

 
Author Fahad Shahbaz Khan; Jiaolong Xu; Muhammad Anwer Rao; Joost Van de Weijer; Andrew Bagdanov; Antonio Lopez edit  doi
openurl 
  Title Recognizing Actions through Action-specific Person Detection Type Journal Article
  Year 2015 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP  
  Volume 24 Issue 11 Pages (down) 4422-4432  
  Keywords  
  Abstract Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test tim- , outperforms on both data sets state-of-the-art methods, which do use person locations.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1057-7149 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; LAMP; 600.076; 600.079 Approved no  
  Call Number Admin @ si @ KXR2015 Serial 2668  
Permanent link to this record
 

 
Author Jose Manuel Alvarez; Ferran Diego; Joan Serrat; Antonio Lopez edit   pdf
doi  isbn
openurl 
  Title Automatic Ground-truthing using video registration for on-board detection algorithms Type Conference Article
  Year 2009 Publication 16th IEEE International Conference on Image Processing Abbreviated Journal  
  Volume Issue Pages (down) 4389 - 4392  
  Keywords  
  Abstract Ground-truth data is essential for the objective evaluation of object detection methods in computer vision. Many works claim their method is robust but they support it with experiments which are not quantitatively assessed with regard some ground-truth. This is one of the main obstacles to properly evaluate and compare such methods. One of the main reasons is that creating an extensive and representative ground-truth is very time consuming, specially in the case of video sequences, where thousands of frames have to be labelled. Could such a ground-truth be generated, at least in part, automatically? Though it may seem a contradictory question, we show that this is possible for the case of video sequences recorded from a moving camera. The key idea is transferring existing frame segmentations from a reference sequence into another video sequence recorded at a different time on the same track, possibly under a different ambient lighting. We have carried out experiments on several video sequence pairs and quantitatively assessed the precision of the transformed ground-truth, which prove that our approach is not only feasible but also quite accurate.  
  Address Cairo, Egypt  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1522-4880 ISBN 978-1-4244-5653-6 Medium  
  Area Expedition Conference ICIP  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ ADS2009 Serial 1201  
Permanent link to this record
 

 
Author Idoia Ruiz; Joan Serrat edit  doi
openurl 
  Title Hierarchical Novelty Detection for Traffic Sign Recognition Type Journal Article
  Year 2022 Publication Sensors Abbreviated Journal SENS  
  Volume 22 Issue 12 Pages (down) 4389  
  Keywords Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision  
  Abstract Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.154 Approved no  
  Call Number Admin @ si @ RuS2022 Serial 3684  
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Sabari Nathan; Priya Kansal; Armin Mehri; Parichehr Behjati Ardakani; A.Dalal; A.Akula; D.Sharma; S.Pandey; B.Kumar; J.Yao; R.Wu; KFeng; N.Li; Y.Zhao; H.Patel; V. Chudasama; K.Pjajapati; A.Sarvaiya; K.Upla; K.Raja; R.Ramachandra; C.Bush; F.Almasri; T.Vandamme; O.Debeir; N.Gutierrez; Q.Nguyen; W.Beksi edit   pdf
url  doi
openurl 
  Title Thermal Image Super-Resolution Challenge – PBVS 2021 Type Conference Article
  Year 2021 Publication Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages (down) 4359-4367  
  Keywords  
  Abstract This paper presents results from the second Thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the same thermal image dataset considered during the first challenge has been used; only mid-resolution (MR) and high-resolution (HR) sets have been considered. The dataset consists of 951 training images and 50 testing images for each resolution. A set of 20 images for each resolution is kept aside for evaluation. The two evaluation methodologies proposed for the first challenge are also considered in this opportunity. The first evaluation task consists of measuring the PSNR and SSIM between the obtained SR image and the corresponding ground truth (i.e., the HR thermal image downsampled by four). The second evaluation also consists of measuring the PSNR and SSIM, but in this case, considers the x2 SR obtained from the given MR thermal image; this evaluation is performed between the SR image with respect to the semi-registered HR image, which has been acquired with another camera. The results outperformed those from the first challenge, thus showing an improvement in both evaluation metrics.  
  Address Virtual; June 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ RSV2021 Serial 3581  
Permanent link to this record
 

 
Author Claudio Baecchi; Francesco Turchini; Lorenzo Seidenari; Andrew Bagdanov; Alberto del Bimbo edit  openurl
  Title Fisher vectors over random density forest for object recognition Type Conference Article
  Year 2014 Publication 22nd International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages (down) 4328-4333  
  Keywords  
  Abstract  
  Address Stockholm; Sweden; August 2014  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes LAMP; 600.079 Approved no  
  Call Number Admin @ si @ BTS2014 Serial 2518  
Permanent link to this record
 

 
Author Susana Alvarez; Maria Vanrell edit   pdf
url  doi
openurl 
  Title Texton theory revisited: a bag-of-words approach to combine textons Type Journal Article
  Year 2012 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 45 Issue 12 Pages (down) 4312-4325  
  Keywords  
  Abstract The aim of this paper is to revisit an old theory of texture perception and
update its computational implementation by extending it to colour. With this in mind we try to capture the optimality of perceptual systems. This is achieved in the proposed approach by sharing well-known early stages of the visual processes and extracting low-dimensional features that perfectly encode adequate properties for a large variety of textures without needing further learning stages. We propose several descriptors in a bag-of-words framework that are derived from different quantisation models on to the feature spaces. Our perceptual features are directly given by the shape and colour attributes of image blobs, which are the textons. In this way we avoid learning visual words and directly build the vocabularies on these lowdimensionaltexton spaces. Main differences between proposed descriptors rely on how co-occurrence of blob attributes is represented in the vocabularies. Our approach overcomes current state-of-art in colour texture description which is proved in several experiments on large texture datasets.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes CIC Approved no  
  Call Number Admin @ si @ AlV2012a Serial 2130  
Permanent link to this record
 

 
Author Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Scene Text Visual Question Answering Type Conference Article
  Year 2019 Publication 18th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages (down) 4291-4301  
  Keywords  
  Abstract Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.  
  Address Seul; Corea; October 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCV  
  Notes DAG; 600.129; 600.135; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ BTM2019b Serial 3285  
Permanent link to this record
 

 
Author Jaume Amores edit  doi
isbn  openurl
  Title Vocabulary-based Approaches for Multiple-Instance Data: a Comparative Study Type Conference Article
  Year 2010 Publication 20th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages (down) 4246–4250  
  Keywords  
  Abstract Multiple Instance Learning (MIL) has become a hot topic and many different algorithms have been proposed in the last years. Despite this fact, there is a lack of comparative studies that shed light into the characteristics of the different methods and their behavior in different scenarios. In this paper we provide such an analysis. We include methods from different families, and pay special attention to vocabulary-based approaches, a new family of methods that has not received much attention in the MIL literature. The empirical comparison includes seven databases from four heterogeneous domains, implementations of eight popular MIL methods, and a study of the behavior under synthetic conditions. Based on this analysis, we show that, with an appropriate implementation, vocabulary-based approaches outperform other MIL methods in most of the cases, showing in general a more consistent performance.  
  Address Istanbul, Turkey  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1051-4651 ISBN 978-1-4244-7542-1 Medium  
  Area Expedition Conference ICPR  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ Amo2010 Serial 1295  
Permanent link to this record
 

 
Author Xavier Perez Sala; Sergio Escalera; Cecilio Angulo; Jordi Gonzalez edit   pdf
doi  openurl
  Title A survey on model based approaches for 2D and 3D visual human pose recovery Type Journal Article
  Year 2014 Publication Sensors Abbreviated Journal SENS  
  Volume 14 Issue 3 Pages (down) 4189-4210  
  Keywords human pose recovery; human body modelling; behavior analysis; computer vision  
  Abstract Human Pose Recovery has been studied in the field of Computer Vision for the last 40 years. Several approaches have been reported, and significant improvements have been obtained in both data representation and model design. However, the problem of Human Pose Recovery in uncontrolled environments is far from being solved. In this paper, we define a general taxonomy to group model based approaches for Human Pose Recovery, which is composed of five main modules: appearance, viewpoint, spatial relations, temporal consistence, and behavior. Subsequently, a methodological comparison is performed following the proposed taxonomy, evaluating current SoA approaches in the aforementioned five group categories. As a result of this comparison, we discuss the main advantages and drawbacks of the reviewed literature.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; ISE; 600.046; 600.063; 600.078;MILAB Approved no  
  Call Number Admin @ si @ PEA2014 Serial 2443  
Permanent link to this record
 

 
Author R. de Nijs; Sebastian Ramos; Gemma Roig; Xavier Boix; Luc Van Gool; K. Kühnlenz. edit   pdf
openurl 
  Title On-line Semantic Perception Using Uncertainty Type Conference Article
  Year 2012 Publication International Conference on Intelligent Robots and Systems Abbreviated Journal IROS  
  Volume Issue Pages (down) 4185-4191  
  Keywords Semantic Segmentation  
  Abstract Visual perception capabilities are still highly unreliable in unconstrained settings, and solutions might not beaccurate in all regions of an image. Awareness of the uncertainty of perception is a fundamental requirement for proper high level decision making in a robotic system. Yet, the uncertainty measure is often sacrificed to account for dependencies between object/region classifiers. This is the case of Conditional Random Fields (CRFs), the success of which stems from their ability to infer the most likely world configuration, but they do not directly allow to estimate the uncertainty of the solution. In this paper, we consider the setting of assigning semantic labels to the pixels of an image sequence. Instead of using a CRF, we employ a Perturb-and-MAP Random Field, a recently introduced probabilistic model that allows performing fast approximate sampling from its probability density function. This allows to effectively compute the uncertainty of the solution, indicating the reliability of the most likely labeling in each region of the image. We report results on the CamVid dataset, a standard benchmark for semantic labeling of urban image sequences. In our experiments, we show the benefits of exploiting the uncertainty by putting more computational effort on the regions of the image that are less reliable, and use more efficient techniques for other regions, showing little decrease of performance  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IROS  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ NRR2012 Serial 2378  
Permanent link to this record
 

 
Author M. Campos-Taberner; Adriana Romero; Carlo Gatta; Gustavo Camps-Valls edit  url
doi  openurl
  Title Shared feature representations of LiDAR and optical images: Trading sparsity for semantic discrimination Type Conference Article
  Year 2015 Publication IEEE International Geoscience and Remote Sensing Symposium IGARSS2015 Abbreviated Journal  
  Volume Issue Pages (down) 4169 - 4172  
  Keywords  
  Abstract This paper studies the level of complementary information conveyed by extremely high resolution LiDAR and optical images. We pursue this goal following an indirect approach via unsupervised spatial-spectral feature extraction. We used a recently presented unsupervised convolutional neural network trained to enforce both population and lifetime spar-sity in the feature representation. We derived independent and joint feature representations, and analyzed the sparsity scores and the discriminative power. Interestingly, the obtained results revealed that the RGB+LiDAR representation is no longer sparse, and the derived basis functions merge color and elevation yielding a set of more expressive colored edge filters. The joint feature representation is also more discriminative when used for clustering and topological data visualization.  
  Address Milan; Italy; July 2015  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IGARSS  
  Notes LAMP; 600.079;MILAB Approved no  
  Call Number Admin @ si @ CRG2015 Serial 2724  
Permanent link to this record
 

 
Author Joan Mas; Josep Llados; Gemma Sanchez; J.A. Jorge edit  url
doi  openurl
  Title A syntactic approach based on distortion-tolerant Adjacency Grammars and a spatial-directed parser to interpret sketched diagrams Type Journal Article
  Year 2010 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 43 Issue 12 Pages (down) 4148–4164  
  Keywords Syntactic Pattern Recognition; Symbol recognition; Diagram understanding; Sketched diagrams; Adjacency Grammars; Incremental parsing; Spatial directed parsing  
  Abstract This paper presents a syntactic approach based on Adjacency Grammars (AG) for sketch diagram modeling and understanding. Diagrams are a combination of graphical symbols arranged according to a set of spatial rules defined by a visual language. AG describe visual shapes by productions defined in terms of terminal and non-terminal symbols (graphical primitives and subshapes), and a set functions describing the spatial arrangements between symbols. Our approach to sketch diagram understanding provides three main contributions. First, since AG are linear grammars, there is a need to define shapes and relations inherently bidimensional using a sequential formalism. Second, our parsing approach uses an indexing structure based on a spatial tessellation. This serves to reduce the search space when finding candidates to produce a valid reduction. This allows order-free parsing of 2D visual sentences while keeping combinatorial explosion in check. Third, working with sketches requires a distortion model to cope with the natural variations of hand drawn strokes. To this end we extended the basic grammar with a distortion measure modeled on the allowable variation on spatial constraints associated with grammar productions. Finally, the paper reports on an experimental framework an interactive system for sketch analysis. User tests performed on two real scenarios show that our approach is usable in interactive settings.  
  Address  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number DAG @ dag @ MLS2010 Serial 1336  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: