|   | 
Details
   web
Records
Author Dena Bazazian
Title Fully Convolutional Networks for Text Understanding in Scene Images Type Book Whole
Year 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task.
The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging.
Address November 2018
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication (up) Editor Dimosthenis Karatzas;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-1-1 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Baz2018 Serial 3220
Permanent link to this record
 

 
Author Arnau Ramisa; Adriana Tapus; David Aldavert; Ricardo Toledo; Ramon Lopez de Mantaras
Title Robust Vision-Based Localization using Combinations of Local Feature Regions Detectors Type Journal Article
Year 2009 Publication Autonomous Robots Abbreviated Journal AR
Volume 27 Issue 4 Pages 373-385
Keywords
Abstract This paper presents a vision-based approach for mobile robot localization. The model of the environment is topological. The new approach characterizes a place using a signature. This signature consists of a constellation of descriptors computed over different types of local affine covariant regions extracted from an omnidirectional image acquired rotating a standard camera with a pan-tilt unit. This type of representation permits a reliable and distinctive environment modelling. Our objectives were to validate the proposed method in indoor environments and, also, to find out if the combination of complementary local feature region detectors improves the localization versus using a single region detector. Our experimental results show that if false matches are effectively rejected, the combination of different covariant affine region detectors increases notably the performance of the approach by combining the different strengths of the individual detectors. In order to reduce the localization time, two strategies are evaluated: re-ranking the map nodes using a global similarity measure and using standard perspective view field of 45°.
In order to systematically test topological localization methods, another contribution proposed in this work is a novel method to see the degradation in localization performance as the robot moves away from the point where the original signature was acquired. This allows to know the robustness of the proposed signature. In order for this to be effective, it must be done in several, variated, environments that test all the possible situations in which the robot may have to perform localization.
Address
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0929-5593 ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ RTA2009 Serial 1245
Permanent link to this record
 

 
Author David Aldavert; Ricardo Toledo; Arnau Ramisa; Ramon Lopez de Mantaras
Title Efficient Object Pixel-Level Categorization using Bag of Features: Advances in Visual Computing Type Conference Article
Year 2009 Publication 5th International Symposium on Visual Computing Abbreviated Journal
Volume 5875 Issue Pages 44–55
Keywords
Abstract In this paper we present a pixel-level object categorization method suitable to be applied under real-time constraints. Since pixels are categorized using a bag of features scheme, the major bottleneck of such an approach would be the feature pooling in local histograms of visual words. Therefore, we propose to bypass this time-consuming step and directly obtain the score from a linear Support Vector Machine classifier. This is achieved by creating an integral image of the components of the SVM which can readily obtain the classification score for any image sub-window with only 10 additions and 2 products, regardless of its size. Besides, we evaluated the performance of two efficient feature quantization methods: the Hierarchical K-Means and the Extremely Randomized Forest. All experiments have been done in the Graz02 database, showing comparable, or even better results to related work with a lower computational cost.
Address Las Vegas, USA
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-10330-8 Medium
Area Expedition Conference ISVC
Notes ADAS Approved no
Call Number Admin @ si @ ATR2009a Serial 1246
Permanent link to this record
 

 
Author David Aldavert; Ricardo Toledo; Arnau Ramisa; Ramon Lopez de Mantaras
Title Visual Registration Method For A Low Cost Robot: Computer Vision Systems Type Conference Article
Year 2009 Publication 7th International Conference on Computer Vision Systems Abbreviated Journal
Volume 5815 Issue Pages 204–214
Keywords
Abstract An autonomous mobile robot must face the correspondence or data association problem in order to carry out tasks like place recognition or unknown environment mapping. In order to put into correspondence two maps, most methods estimate the transformation relating the maps from matches established between low level feature extracted from sensor data. However, finding explicit matches between features is a challenging and computationally expensive task. In this paper, we propose a new method to align obstacle maps without searching explicit matches between features. The maps are obtained from a stereo pair. Then, we use a vocabulary tree approach to identify putative corresponding maps followed by the Newton minimization algorithm to find the transformation that relates both maps. The proposed method is evaluated in a typical office environment showing good performance.
Address Belgica
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-04666-7 Medium
Area Expedition Conference ICVS
Notes ADAS Approved no
Call Number Admin @ si @ ATR2009b Serial 1247
Permanent link to this record
 

 
Author Arnau Ramisa; Shrihari Vasudevan; David Aldavert; Ricardo Toledo; Ramon Lopez de Mantaras
Title Evaluation of the SIFT Object Recognition Method in Mobile Robots: Frontiers in Artificial Intelligence and Applications Type Conference Article
Year 2009 Publication 12th International Conference of the Catalan Association for Artificial Intelligence Abbreviated Journal
Volume 202 Issue Pages 9-18
Keywords
Abstract General object recognition in mobile robots is of primary importance in order to enhance the representation of the environment that robots will use for their reasoning processes. Therefore, we contribute reduce this gap by evaluating the SIFT Object Recognition method in a challenging dataset, focusing on issues relevant to mobile robotics. Resistance of the method to the robotics working conditions was found, but it was limited mainly to well-textured objects.
Address Cardona, Spain
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0922-6389 ISBN 978-1-60750-061-2 Medium
Area Expedition Conference CCIA
Notes ADAS Approved no
Call Number Admin @ si @ RVA2009 Serial 1248
Permanent link to this record
 

 
Author Carlo Gatta; Oriol Pujol; O. Rodriguez-Leor; J. M. Ferre; Petia Radeva
Title Fast Rigid Registration of Vascular Structures in IVUS Sequences Type Journal Article
Year 2009 Publication IEEE Transactions on Information Technology in Biomedicine Abbreviated Journal
Volume 13 Issue 6 Pages 106-1011
Keywords
Abstract Intravascular ultrasound (IVUS) technology permits visualization of high-resolution images of internal vascular structures. IVUS is a unique image-guiding tool to display longitudinal view of the vessels, and estimate the length and size of vascular structures with the goal of accurate diagnosis. Unfortunately, due to pulsatile contraction and expansion of the heart, the captured images are affected by different motion artifacts that make visual inspection difficult. In this paper, we propose an efficient algorithm that aligns vascular structures and strongly reduces the saw-shaped oscillation, simplifying the inspection of longitudinal cuts; it reduces the motion artifacts caused by the displacement of the catheter in the short-axis plane and the catheter rotation due to vessel tortuosity. The algorithm prototype aligns 3.16 frames/s and clearly outperforms state-of-the-art methods with similar computational cost. The speed of the algorithm is crucial since it allows to inspect the corrected sequence during patient intervention. Moreover, we improved an indirect methodology for IVUS rigid registration algorithm evaluation.
Address
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1089-7771 ISBN Medium
Area Expedition Conference
Notes MILAB;HuPBA Approved no
Call Number BCNPCL @ bcnpcl @ GPL2009 Serial 1250
Permanent link to this record
 

 
Author Fosca De Iorio; C. Malagelada; Fernando Azpiroz; M. Maluenda; C. Violanti; Laura Igual; Jordi Vitria; Juan R. Malagelada
Title Intestinal motor activity, endoluminal motion and transit Type Journal Article
Year 2009 Publication Neurogastroenterology & Motility Abbreviated Journal NEUMOT
Volume 21 Issue 12 Pages 1264–e119
Keywords
Abstract A programme for evaluation of intestinal motility has been recently developed based on endoluminal image analysis using computer vision methodology and machine learning techniques. Our aim was to determine the effect of intestinal muscle inhibition on wall motion, dynamics of luminal content and transit in the small bowel. Fourteen healthy subjects ingested the endoscopic capsule (Pillcam, Given Imaging) in fasting conditions. Seven of them received glucagon (4.8 microg kg(-1) bolus followed by a 9.6 microg kg(-1) h(-1) infusion during 1 h) and in the other seven, fasting activity was recorded, as controls. This dose of glucagon has previously shown to inhibit both tonic and phasic intestinal motor activity. Endoluminal image and displacement was analyzed by means of a computer vision programme specifically developed for the evaluation of muscular activity (contractile and non-contractile patterns), intestinal contents, endoluminal motion and transit. Thirty-minute periods before, during and after glucagon infusion were analyzed and compared with equivalent periods in controls. No differences were found in the parameters measured during the baseline (pretest) periods when comparing glucagon and control experiments. During glucagon infusion, there was a significant reduction in contractile activity (0.2 +/- 0.1 vs 4.2 +/- 0.9 luminal closures per min, P < 0.05; 0.4 +/- 0.1 vs 3.4 +/- 1.2% of images with radial wrinkles, P < 0.05) and a significant reduction of endoluminal motion (82 +/- 9 vs 21 +/- 10% of static images, P < 0.05). Endoluminal image analysis, by means of computer vision and machine learning techniques, can reliably detect reduced intestinal muscle activity and motion.
Address
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;MILAB;MV Approved no
Call Number BCNPCL @ bcnpcl @ DMA2009 Serial 1251
Permanent link to this record
 

 
Author Oriol Pujol; David Masip
Title Geometry-Based Ensembles: Toward a Structural Characterization of the Classification Boundary Type Journal Article
Year 2009 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 31 Issue 6 Pages 1140–1146
Keywords
Abstract This article introduces a novel binary discriminative learning technique based on the approximation of the non-linear decision boundary by a piece-wise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points – points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a final lambda-smooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and non-linear behavior is obtained. The simplicity of the method allows its extension to cope with some of nowadays machine learning challenges, such as online learning, large scale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database. Finally, we apply our technique in online and large scale scenarios, and in six real life computer vision and pattern recognition problems: gender recognition, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas' disease severity detection, clef classification and action recognition using a 3D accelerometer data. The results are promising and this paper opens a line of research that deserves further attention
Address
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;HuPBA;MV Approved no
Call Number BCNPCL @ bcnpcl @ PuM2009 Serial 1252
Permanent link to this record
 

 
Author J. Oliver; Ricardo Toledo; J. Pujol; J. Sorribes; E. Valderrama
Title Un ABP basado en la robotica para las ingenierias informaticas Type Miscellaneous
Year 2009 Publication 15th Jornadas de Enseñanza Universitaria de la Informatica Abbreviated Journal
Volume Issue Pages 331–338
Keywords
Abstract
Address Barcelona, Spain
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN ISBN:978–84–692–2758–9 Medium
Area Expedition Conference JENUI
Notes ADAS Approved no
Call Number Admin @ si @ OTP2009 Serial 1253
Permanent link to this record
 

 
Author Eduard Vazquez
Title Distribution Characterization using Topological Features. Application to Colour Image Processing Type Report
Year 2007 Publication CVC Technical Report # 107 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Vaz2009 Serial 1254
Permanent link to this record
 

 
Author Oscar Camara; Estanislao Oubel; Gemma Piella; Simone Balocco; Mathieu De Craene; Alejandro F. Frangi
Title Multi-sequence Registration of Cine, Tagged and Delay-Enhancement MRI with Shift Correction and Steerable Pyramid-Based Detagging Type Conference Article
Year 2009 Publication 5th International Conference on Functional Imaging and Modeling of the Heart Abbreviated Journal
Volume 5528 Issue Pages 330–338
Keywords
Abstract In this work, we present a registration framework for cardiac cine MRI (cMRI), tagged (tMRI) and delay-enhancement MRI (deMRI), where the two main issues to find an accurate alignment between these images have been taking into account: the presence of tags in tMRI and respiration artifacts in all sequences. A steerable pyramid image decomposition has been used for detagging purposes since it is suitable to extract high-order oriented structures by directional adaptive filtering. Shift correction of cMRI is achieved by firstly maximizing the similarity between the Long Axis and Short Axis cMRI. Subsequently, these shift-corrected images are used as target images in a rigid registration procedure with their corresponding tMRI/deMRI in order to correct their shift. The proposed registration framework has been evaluated by 840 registration tests, considerably improving the alignment of the MR images (mean RMS error of 2.04mm vs. 5.44mm).
Address Nice, France
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-01931-9 Medium
Area Expedition Conference FIMH
Notes MILAB Approved no
Call Number BCNPCL @ bcnpcl @ COP2009 Serial 1255
Permanent link to this record
 

 
Author Fadi Dornaika; Bogdan Raducanu
Title Simultaneous 3D face pose and person-specific shape estimation from a single image using a holistic approach Type Conference Article
Year 2009 Publication IEEE Workshop on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages
Keywords
Abstract This paper presents a new approach for the simultaneous estimation of the 3D pose and specific shape of a previously unseen face from a single image. The face pose is not limited to a frontal view. We describe a holistic approach based on a deformable 3D model and a learned statistical facial texture model. Rather than obtaining a person-specific facial surface, the goal of this work is to compute person-specific 3D face shape in terms of a few control parameters that are used by many applications. The proposed holistic approach estimates the 3D pose parameters as well as the face shape control parameters by registering the warped texture to a statistical face texture, which is carried out by a stochastic and genetic optimizer. The proposed approach has several features that make it very attractive: (i) it uses a single grey-scale image, (ii) it is person-independent, (iii) it is featureless (no facial feature extraction is required), and (iv) its learning stage is easy. The proposed approach lends itself nicely to 3D face tracking and face gesture recognition in monocular videos. We describe extensive experiments that show the feasibility and robustness of the proposed approach.
Address Utah, USA
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1550-5790 ISBN 978-1-4244-5497-6 Medium
Area Expedition Conference WACV
Notes OR;MV Approved no
Call Number BCNPCL @ bcnpcl @ DoR2009b Serial 1256
Permanent link to this record
 

 
Author Bogdan Raducanu; Fadi Dornaika
Title Natural Facial Expression Recognition Using Dynamic and Static Schemes Type Conference Article
Year 2009 Publication 5th International Symposium on Visual Computing Abbreviated Journal
Volume 5875 Issue Pages 730–739
Keywords
Abstract Affective computing is at the core of a new paradigm in HCI and AI represented by human-centered computing. Within this paradigm, it is expected that machines will be enabled with perceiving capabilities, making them aware about users’ affective state. The current paper addresses the problem of facial expression recognition from monocular videos sequences. We propose a dynamic facial expression recognition scheme, which is proven to be very efficient. Furthermore, it is conveniently compared with several static-based systems adopting different magnitude of facial expression. We provide evaluations of performance using Linear Discriminant Analysis (LDA), Non parametric Discriminant Analysis (NDA), and Support Vector Machines (SVM). We also provide performance evaluations using arbitrary test video sequences.
Address Las Vegas, USA
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-10330-8 Medium
Area Expedition Conference ISVC
Notes OR;MV Approved no
Call Number BCNPCL @ bcnpcl @ RaD2009 Serial 1257
Permanent link to this record
 

 
Author Sergio Escalera; Oriol Pujol; J. Mauri; Petia Radeva
Title Intravascular Ultrasound Tissue Characterization with Sub-class Error-Correcting Output Codes Type Journal Article
Year 2009 Publication Journal of Signal Processing Systems Abbreviated Journal
Volume 55 Issue 1-3 Pages 35–47
Keywords
Abstract Intravascular ultrasound (IVUS) represents a powerful imaging technique to explore coronary vessels and to study their morphology and histologic properties. In this paper, we characterize different tissues based on radial frequency, texture-based, and combined features. To deal with the classification of multiple tissues, we require the use of robust multi-class learning techniques. In this sense, error-correcting output codes (ECOC) show to robustly combine binary classifiers to solve multi-class problems. In this context, we propose a strategy to model multi-class classification tasks using sub-classes information in the ECOC framework. The new strategy splits the classes into different sub-sets according to the applied base classifier. Complex IVUS data sets containing overlapping data are learnt by splitting the original set of classes into sub-classes, and embedding the binary problems in a problem-dependent ECOC design. The method automatically characterizes different tissues, showing performance improvements over the state-of-the-art ECOC techniques for different base classifiers. Furthermore, the combination of RF and texture-based features also shows improvements over the state-of-the-art approaches.
Address
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1939-8018 ISBN Medium
Area Expedition Conference
Notes MILAB;HuPBA Approved no
Call Number BCNPCL @ bcnpcl @ EPM2009 Serial 1258
Permanent link to this record
 

 
Author Anjan Dutta; Zeynep Akata
Title Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval Type Conference Article
Year 2019 Publication 32nd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 5089-5098
Keywords
Abstract Zero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.
Address Long beach; California; USA; June 2019
Corporate Author Thesis
Publisher Place of Publication (up) Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes DAG; 600.141; 600.121 Approved no
Call Number Admin @ si @ DuA2019 Serial 3268
Permanent link to this record