|   | 
Details
   web
Records
Author (down) Muhammad Anwer Rao
Title Color for Object Detection and Action Recognition Type Book Whole
Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Recognizing object categories in real world images is a challenging problem in computer vision. The deformable part based framework is currently the most successful approach for object detection. Generally, HOG are used for image representation within the part-based framework. For action recognition, the bag-of-word framework has shown to provide promising results. Within the bag-of-words framework, local image patches are described by SIFT descriptor. Contrary to object detection and action recognition, combining color and shape has shown to provide the best performance for object and scene recognition.

In the first part of this thesis, we analyze the problem of person detection in still images. Standard person detection approaches rely on intensity based features for image representation while ignoring the color. Channel based descriptors is one of the most commonly used approaches in object recognition. This inspires us to evaluate incorporating color information using the channel based fusion approach for the task of person detection.

In the second part of the thesis, we investigate the problem of object detection in still images. Due to high dimensionality, channel based fusion increases the computational cost. Moreover, channel based fusion has been found to obtain inferior results for object category where one of the visual varies significantly. On the other hand, late fusion is known to provide improved results for a wide range of object categories. A consequence of late fusion strategy is the need of a pure color descriptor. Therefore, we propose to use Color attributes as an explicit color representation for object detection. Color attributes are compact and computationally efficient. Consequently color attributes are combined with traditional shape features providing excellent results for object detection task.

Finally, we focus on the problem of action detection and classification in still images. We investigate the potential of color for action classification and detection in still images. We also evaluate different fusion approaches for combining color and shape information for action recognition. Additionally, an analysis is performed to validate the contribution of color for action recognition. Our results clearly demonstrate that combining color and shape information significantly improve the performance of both action classification and detection in still images.
Address Barcelona
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez;Joost Van de Weijer
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Rao2013 Serial 2281
Permanent link to this record
 

 
Author (down) Monica Piñol; Angel Sappa; Ricardo Toledo
Title MultiTable Reinforcement for Visual Object Recognition Type Conference Article
Year 2012 Publication 4th International Conference on Signal and Image Processing Abbreviated Journal
Volume 221 Issue Pages 469-480
Keywords
Abstract This paper presents a bag of feature based method for visual object recognition. Our contribution is focussed on the selection of the best feature descriptor. It is implemented by using a novel multi-table reinforcement learning method that selects among five of classical descriptors (i.e., Spin, SIFT, SURF, C-SIFT and PHOW) the one that best describes each image. Experimental results and comparisons are provided showing the improvements achieved with the proposed approach.
Address Coimbatore, India
Corporate Author Thesis
Publisher Springer India Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 1876-1100 ISBN 978-81-322-0996-6 Medium
Area Expedition Conference ICSIP
Notes ADAS Approved no
Call Number Admin @ si @ PST2012 Serial 2157
Permanent link to this record
 

 
Author (down) Monica Piñol; Angel Sappa; Ricardo Toledo
Title Adaptive Feature Descriptor Selection based on a Multi-Table Reinforcement Learning Strategy Type Journal Article
Year 2015 Publication Neurocomputing Abbreviated Journal NEUCOM
Volume 150 Issue A Pages 106–115
Keywords Reinforcement learning; Q-learning; Bag of features; Descriptors
Abstract This paper presents and evaluates a framework to improve the performance of visual object classification methods, which are based on the usage of image feature descriptors as inputs. The goal of the proposed framework is to learn the best descriptor for each image in a given database. This goal is reached by means of a reinforcement learning process using the minimum information. The visual classification system used to demonstrate the proposed framework is based on a bag of features scheme, and the reinforcement learning technique is implemented through the Q-learning approach. The behavior of the reinforcement learning with different state definitions is evaluated. Additionally, a method that combines all these states is formulated in order to select the optimal state. Finally, the chosen actions are obtained from the best set of image descriptors in the literature: PHOW, SIFT, C-SIFT, SURF and Spin. Experimental results using two public databases (ETH and COIL) are provided showing both the validity of the proposed approach and comparisons with state of the art. In all the cases the best results are obtained with the proposed approach.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.055; 600.076 Approved no
Call Number Admin @ si @ PST2015 Serial 2473
Permanent link to this record
 

 
Author (down) Monica Piñol; Angel Sappa; Angeles Lopez; Ricardo Toledo
Title Feature Selection Based on Reinforcement Learning for Object Recognition Type Conference Article
Year 2012 Publication Adaptive Learning Agents Workshop Abbreviated Journal
Volume Issue Pages 33-39
Keywords
Abstract
Address Valencia
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ALA
Notes ADAS; RV Approved no
Call Number Admin @ si @ PSL2012 Serial 2018
Permanent link to this record
 

 
Author (down) Monica Piñol
Title Adaptative Vocabulary Tree for Image Classification using Reinforcement Learning Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume 162 Issue Pages
Keywords
Abstract
Address Bellaterra (Barcelona)
Corporate Author Computer Vision Center Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Piñ2010 Serial 1936
Permanent link to this record
 

 
Author (down) Monica Piñol
Title Reinforcement Learning of Visual Descriptors for Object Recognition Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The human visual system is able to recognize the object in an image even if the object is partially occluded, from various points of view, in different colors, or with independence of the distance to the object. To do this, the eye obtains an image and extracts features that are sent to the brain, and then, in the brain the object is recognized. In computer vision, the object recognition branch tries to learns from the human visual system behaviour to achieve its goal. Hence, an algorithm is used to identify representative features of the scene (detection), then another algorithm is used to describe these points (descriptor) and finally the extracted information is used for classifying the object in the scene. The selection of this set of algorithms is a very complicated task and thus, a very active research field. In this thesis we are focused on the selection/learning of the best descriptor for a given image. In the state of the art there are several descriptors but we do not know how to choose the best descriptor because depends on scenes that we will use (dataset) and the algorithm chosen to do the classification. We propose a framework based on reinforcement learning and bag of features to choose the best descriptor according to the given image. The system can analyse the behaviour of different learning algorithms and descriptor sets. Furthermore the proposed framework for improving the classification/recognition ratio can be used with minor changes in other computer vision fields, such as video retrieval.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Ricardo Toledo;Angel Sappa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-5-7 Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ Piñ2014 Serial 2464
Permanent link to this record
 

 
Author (down) Mohammed Al Rawi; Ernest Valveny; Dimosthenis Karatzas
Title Can One Deep Learning Model Learn Script-Independent Multilingual Word-Spotting? Type Conference Article
Year 2019 Publication 15th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 260-267
Keywords
Abstract Word spotting has gained increased attention lately as it can be used to extract textual information from handwritten documents and scene-text images. Current word spotting approaches are designed to work on a single language and/or script. Building intelligent models that learn script-independent multilingual word-spotting is challenging due to the large variability of multilingual alphabets and symbols. We used ResNet-152 and the Pyramidal Histogram of Characters (PHOC) embedding to build a one-model script-independent multilingual word-spotting and we tested it on Latin, Arabic, and Bangla (Indian) languages. The one-model we propose performs on par with the multi-model language-specific word-spotting system, and thus, reduces the number of models needed for each script and/or language.
Address Sydney; Australia; September 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.129; 600.121 Approved no
Call Number Admin @ si @ RVK2019 Serial 3337
Permanent link to this record
 

 
Author (down) Mohammed Al Rawi; Ernest Valveny
Title Compact and Efficient Multitask Learning in Vision, Language and Speech Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 2933-2942
Keywords
Abstract Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.
Address Seul; Korea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ RaV2019 Serial 3365
Permanent link to this record
 

 
Author (down) Mohammed Al Rawi; Dimosthenis Karatzas
Title On the Labeling Correctness in Computer Vision Datasets Type Conference Article
Year 2018 Publication Proceedings of the Workshop on Interactive Adaptive Learning, co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Image datasets have heavily been used to build computer vision systems.
These datasets are either manually or automatically labeled, which is a
problem as both labeling methods are prone to errors. To investigate this problem, we use a majority voting ensemble that combines the results from several Convolutional Neural Networks (CNNs). Majority voting ensembles not only enhance the overall performance, but can also be used to estimate the confidence level of each sample. We also examined Softmax as another form to estimate posterior probability. We have designed various experiments with a range of different ensembles built from one or different, or temporal/snapshot CNNs, which have been trained multiple times stochastically. We analyzed CIFAR10, CIFAR100, EMNIST, and SVHN datasets and we found quite a few incorrect
labels, both in the training and testing sets. We also present detailed confidence analysis on these datasets and we found that the ensemble is better than the Softmax when used estimate the per-sample confidence. This work thus proposes an approach that can be used to scrutinize and verify the labeling of computer vision datasets, which can later be applied to weakly/semi-supervised learning. We propose a measure, based on the Odds-Ratio, to quantify how many of these incorrectly classified labels are actually incorrectly labeled and how many of these are confusing. The proposed methods are easily scalable to larger datasets, like ImageNet, LSUN and SUN, as each CNN instance is trained for 60 epochs; or even faster, by implementing a temporal (snapshot) ensemble.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECML-PKDDW
Notes DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ RaK2018 Serial 3144
Permanent link to this record
 

 
Author (down) Mohammad Rouhani; E. Boyer; Angel Sappa
Title Non-Rigid Registration meets Surface Reconstruction Type Conference Article
Year 2014 Publication International Conference on 3D Vision Abbreviated Journal
Volume Issue Pages 617-624
Keywords
Abstract Non rigid registration is an important task in computer vision with many applications in shape and motion modeling. A fundamental step of the registration is the data association between the source and the target sets. Such association proves difficult in practice, due to the discrete nature of the information and its corruption by various types of noise, e.g. outliers and missing data. In this paper we investigate the benefit of the implicit representations for the non-rigid registration of 3D point clouds. First, the target points are described with small quadratic patches that are blended through partition of unity weighting. Then, the discrete association between the source and the target can be replaced by a continuous distance field induced by the interface. By combining this distance field with a proper deformation term, the registration energy can be expressed in a linear least square form that is easy and fast to solve. This significantly eases the registration by avoiding direct association between points. Moreover, a hierarchical approach can be easily implemented by employing coarse-to-fine representations. Experimental results are provided for point clouds from multi-view data sets. The qualitative and quantitative comparisons show the outperformance and robustness of our framework. %in presence of noise and outliers.
Address Tokyo; Japan; December 2014
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference 3DV
Notes ADAS; 600.055; 600.076 Approved no
Call Number Admin @ si @ RBS2014 Serial 2534
Permanent link to this record
 

 
Author (down) Mohammad Rouhani; Angel Sappa; E. Boyer
Title Implicit B-Spline Surface Reconstruction Type Journal Article
Year 2015 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 24 Issue 1 Pages 22 - 32
Keywords
Abstract This paper presents a fast and flexible curve, and surface reconstruction technique based on implicit B-spline. This representation does not require any parameterization and it is locally supported. This fact has been exploited in this paper to propose a reconstruction technique through solving a sparse system of equations. This method is further accelerated to reduce the dimension to the active control lattice. Moreover, the surface smoothness and user interaction are allowed for controlling the surface. Finally, a novel weighting technique has been introduced in order to blend small patches and smooth them in the overlapping regions. The whole framework is very fast and efficient and can handle large cloud of points with very low computational cost. The experimental results show the flexibility and accuracy of the proposed algorithm to describe objects with complex topologies. Comparisons with other fitting methods highlight the superiority of the proposed approach in the presence of noise and missing data.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1057-7149 ISBN Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ RSB2015 Serial 2541
Permanent link to this record
 

 
Author (down) Mohammad Rouhani; Angel Sappa
Title A Novel Approach to Geometric Fitting of Implicit Quadrics Type Conference Article
Year 2009 Publication 8th International Conference on Advanced Concepts for Intelligent Vision Systems Abbreviated Journal
Volume 5807 Issue Pages 121–132
Keywords
Abstract This paper presents a novel approach for estimating the geometric distance from a given point to the corresponding implicit quadric curve/surface. The proposed estimation is based on the height of a tetrahedron, which is used as a coarse but reliable estimation of the real distance. The estimated distance is then used for finding the best set of quadric parameters, by means of the Levenberg-Marquardt algorithm, which is a common framework in other geometric fitting approaches. Comparisons of the proposed approach with previous ones are provided to show both improvements in CPU time as well as in the accuracy of the obtained results.
Address Bordeaux, France
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-04696-4 Medium
Area Expedition Conference ACIVS
Notes ADAS Approved no
Call Number ADAS @ adas @ RoS2009 Serial 1194
Permanent link to this record
 

 
Author (down) Mohammad Rouhani; Angel Sappa
Title Relaxing the 3L Algorithm for an Accurate Implicit Polynomial Fitting Type Conference Article
Year 2010 Publication 23rd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 3066-3072
Keywords
Abstract This paper presents a novel method to increase the accuracy of linear fitting of implicit polynomials. The proposed method is based on the 3L algorithm philosophy. The novelty lies on the relaxation of the additional constraints, already imposed by the 3L algorithm. Hence, the accuracy of the final solution is increased due to the proper adjustment of the expected values in the aforementioned additional constraints. Although iterative, the proposed approach solves the fitting problem within a linear framework, which is independent of the threshold tuning. Experimental results, both in 2D and 3D, showing improvements in the accuracy of the fitting are presented. Comparisons with both state of the art algorithms and a geometric based one (non-linear fitting), which is used as a ground truth, are provided.
Address San Francisco; CA; USA; June 2010
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1063-6919 ISBN 978-1-4244-6984-0 Medium
Area Expedition Conference CVPR
Notes ADAS Approved no
Call Number ADAS @ adas @ RoS2010a Serial 1303
Permanent link to this record
 

 
Author (down) Mohammad Rouhani; Angel Sappa
Title A Fast accurate Implicit Polynomial Fitting Approach Type Conference Article
Year 2010 Publication 17th IEEE International Conference on Image Processing Abbreviated Journal
Volume Issue Pages 1429–1432
Keywords
Abstract This paper presents a novel hybrid approach that combines state of the art fitting algorithms: algebraic-based and geometric-based. It consists of two steps; first, the 3L algorithm is used as an initialization and then, the obtained result, is improved through a geometric approach. The adopted geometric approach is based on a distance estimation that avoids costly search for the real orthogonal distance. Experimental results are presented as well as quantitative comparisons.
Address Hong-Kong
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1522-4880 ISBN 978-1-4244-7992-4 Medium
Area Expedition Conference ICIP
Notes ADAS Approved no
Call Number ADAS @ adas @ RoS2010b Serial 1359
Permanent link to this record
 

 
Author (down) Mohammad Rouhani; Angel Sappa
Title Implicit B-Spline Fitting Using the 3L Algorithm Type Conference Article
Year 2011 Publication 18th IEEE International Conference on Image Processing Abbreviated Journal
Volume Issue Pages 893-896
Keywords
Abstract
Address Brussels, Belgium
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICIP
Notes ADAS Approved no
Call Number Admin @ si @ RoS2011a; ADAS @ adas @ Serial 1782
Permanent link to this record