|
Records |
Links |
|
Author |
Ernest Valveny; Oriol Ramos Terrades; Joan Mas; Marçal Rusiñol |
|
|
Title |
Interactive Document Retrieval and Classification. |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Multimodal Interaction in Image and Video Applications |
Abbreviated Journal |
|
|
|
Volume |
48 |
Issue |
|
Pages |
17-30 |
|
|
Keywords |
|
|
|
Abstract |
In this chapter we describe a system for document retrieval and classification following the interactive-predictive framework. In particular, the system addresses two different scenarios of document analysis: document classification based on visual appearance and logo detection. These two classical problems of document analysis are formulated following the interactive-predictive model, taking the user interaction into account to make easier the process of annotating and labelling the documents. A system implementing this model in a real scenario is presented and analyzed. This system also takes advantage of active learning techniques to speed up the task of labelling the documents. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
Angel Sappa; Jordi Vitria |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ VRM2013 |
Serial |
2341 |
|
Permanent link to this record |
|
|
|
|
Author |
Lluis Pere de las Heras; Joan Mas; Gemma Sanchez; Ernest Valveny |
|
|
Title |
Notation-invariant patch-based wall detector in architectural floor plans |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Graphics Recognition. New Trends and Challenges |
Abbreviated Journal |
|
|
|
Volume |
7423 |
Issue |
|
Pages |
79--88 |
|
|
Keywords |
|
|
|
Abstract |
Architectural floor plans exhibit a large variability in notation. Therefore, segmenting and identifying the elements of any kind of plan becomes a challenging task for approaches based on grouping structural primitives obtained by vectorization. Recently, a patch-based segmentation method working at pixel level and relying on the construction of a visual vocabulary has been proposed in [1], showing its adaptability to different notations by automatically learning the visual appearance of the elements in each different notation. This paper presents an evolution of that previous work, after analyzing and testing several alternatives for each of the different steps of the method: Firstly, an automatic plan-size normalization process is done. Secondly we evaluate different features to obtain the description of every patch. Thirdly, we train an SVM classifier to obtain the category of every patch instead of constructing a visual vocabulary. These variations of the method have been tested for wall detection on two datasets of architectural floor plans with different notations. After studying in deep each of the steps in the process pipeline, we are able to find the best system configuration, which highly outperforms the results on wall segmentation obtained by the original paper. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-36823-3 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.045; 600.056; 605.203 |
Approved |
no |
|
|
Call Number |
Admin @ si @ HMS2013 |
Serial |
2322 |
|
Permanent link to this record |
|
|
|
|
Author |
Joost Van de Weijer; Fahad Shahbaz Khan; Marc Masana |
|
|
Title |
Interactive Visual and Semantic Image Retrieval |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Multimodal Interaction in Image and Video Applications |
Abbreviated Journal |
|
|
|
Volume |
48 |
Issue |
|
Pages |
31-35 |
|
|
Keywords |
|
|
|
Abstract |
One direct consequence of recent advances in digital visual data generation and the direct availability of this information through the World-Wide Web, is a urgent demand for efficient image retrieval systems. The objective of image retrieval is to allow users to efficiently browse through this abundance of images. Due to the non-expert nature of the majority of the internet users, such systems should be user friendly, and therefore avoid complex user interfaces. In this chapter we investigate how high-level information provided by recently developed object recognition techniques can improve interactive image retrieval. Wel apply a bagof- word based image representation method to automatically classify images in a number of categories. These additional labels are then applied to improve the image retrieval system. Next to these high-level semantic labels, we also apply a low-level image description to describe the composition and color scheme of the scene. Both descriptions are incorporated in a user feedback image retrieval setting. The main objective is to show that automatic labeling of images with semantic labels can improve image retrieval results. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
Angel Sappa; Jordi Vitria |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CIC; 605.203; 600.048 |
Approved |
no |
|
|
Call Number |
Admin @ si @ WKC2013 |
Serial |
2284 |
|
Permanent link to this record |
|
|
|
|
Author |
Carles Fernandez; Jordi Gonzalez; Joao Manuel R. S. Taveres; Xavier Roca |
|
|
Title |
Towards Ontological Cognitive System |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Topics in Medical Image Processing and Computational Vision |
Abbreviated Journal |
|
|
|
Volume |
8 |
Issue |
|
Pages |
87-99 |
|
|
Keywords |
|
|
|
Abstract |
The increasing ubiquitousness of digital information in our daily lives has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. This raises a series of technological demands for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Netherlands |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
2212-9391 |
ISBN |
978-94-007-0725-2 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE; 605.203; 302.018; 600.049 |
Approved |
no |
|
|
Call Number |
Admin @ si @ FGT2013 |
Serial |
2287 |
|
Permanent link to this record |
|
|
|
|
Author |
Muhammad Muzzamil Luqman; Jean-Yves Ramel; Josep Llados |
|
|
Title |
Multilevel Analysis of Attributed Graphs for Explicit Graph Embedding in Vector Spaces |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Graph Embedding for Pattern Analysis |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1-26 |
|
|
Keywords |
|
|
|
Abstract |
Ability to recognize patterns is among the most crucial capabilities of human beings for their survival, which enables them to employ their sophisticated neural and cognitive systems [1], for processing complex audio, visual, smell, touch, and taste signals. Man is the most complex and the best existing system of pattern recognition. Without any explicit thinking, we continuously compare, classify, and identify huge amount of signal data everyday [2], starting from the time we get up in the morning till the last second we fall asleep. This includes recognizing the face of a friend in a crowd, a spoken word embedded in noise, the proper key to lock the door, smell of coffee, the voice of a favorite singer, the recognition of alphabetic characters, and millions of more tasks that we perform on regular basis. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer New York |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4614-4456-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ LRL2013b |
Serial |
2271 |
|
Permanent link to this record |
|
|
|
|
Author |
Abel Gonzalez-Garcia; Robert Benavente; Olivier Penacchio; Javier Vazquez; Maria Vanrell; C. Alejandro Parraga |
|
|
Title |
Coloresia: An Interactive Colour Perception Device for the Visually Impaired |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Multimodal Interaction in Image and Video Applications |
Abbreviated Journal |
|
|
|
Volume |
48 |
Issue |
|
Pages |
47-66 |
|
|
Keywords |
|
|
|
Abstract |
A significative percentage of the human population suffer from impairments in their capacity to distinguish or even see colours. For them, everyday tasks like navigating through a train or metro network map becomes demanding. We present a novel technique for extracting colour information from everyday natural stimuli and presenting it to visually impaired users as pleasant, non-invasive sound. This technique was implemented inside a Personal Digital Assistant (PDA) portable device. In this implementation, colour information is extracted from the input image and categorised according to how human observers segment the colour space. This information is subsequently converted into sound and sent to the user via speakers or headphones. In the original implementation, it is possible for the user to send its feedback to reconfigure the system, however several features such as these were not implemented because the current technology is limited.We are confident that the full implementation will be possible in the near future as PDA technology improves. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CIC; 600.052; 605.203 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GBP2013 |
Serial |
2266 |
|
Permanent link to this record |
|
|
|
|
Author |
Michal Drozdzal; Santiago Segui; Petia Radeva; Carolina Malagelada; Fernando Azpiroz; Jordi Vitria |
|
|
Title |
An Application for Efficient Error-Free Labeling of Medical Images |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Multimodal Interaction in Image and Video Applications |
Abbreviated Journal |
|
|
|
Volume |
48 |
Issue |
|
Pages |
1-16 |
|
|
Keywords |
|
|
|
Abstract |
In this chapter we describe an application for efficient error-free labeling of medical images. In this scenario, the compilation of a complete training set for building a realistic model of a given class of samples is not an easy task, making the process tedious and time consuming. For this reason, there is a need for interactive labeling applications that minimize the effort of the user while providing error-free labeling. We propose a new algorithm that is based on data similarity in feature space. This method actively explores data in order to find the best label-aligned clustering and exploits it to reduce the labeler effort, that is measured by the number of “clicks. Moreover, error-free labeling is guaranteed by the fact that all data and their labels proposals are visually revised by en expert. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; OR;MV |
Approved |
no |
|
|
Call Number |
Admin @ si @ DSR2013 |
Serial |
2235 |
|
Permanent link to this record |
|
|
|
|
Author |
Marc Castello; Jordi Gonzalez; Ariel Amato; Pau Baiget; Carles Fernandez; Josep M. Gonfaus; Ramon Mollineda; Marco Pedersoli; Nicolas Perez de la Blanca; Xavier Roca |
|
|
Title |
Exploiting Multimodal Interaction Techniques for Video-Surveillance |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Multimodal Interaction in Image and Video Applications Intelligent Systems Reference Library |
Abbreviated Journal |
|
|
|
Volume |
48 |
Issue |
8 |
Pages |
135-151 |
|
|
Keywords |
|
|
|
Abstract |
In this paper we present an example of a video surveillance application that exploits Multimodal Interactive (MI) technologies. The main objective of the so-called VID-Hum prototype was to develop a cognitive artificial system for both the detection and description of a particular set of human behaviours arising from real-world events. The main procedure of the prototype described in this chapter entails: (i) adaptation, since the system adapts itself to the most common behaviours (qualitative data) inferred from tracking (quantitative data) thus being able to recognize abnormal behaviors; (ii) feedback, since an advanced interface based on Natural Language understanding allows end-users the communicationwith the prototype by means of conceptual sentences; and (iii) multimodality, since a virtual avatar has been designed to describe what is happening in the scene, based on those textual interpretations generated by the prototype. Thus, the MI methodology has provided an adequate framework for all these cooperating processes. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE; 605.203; 600.049 |
Approved |
no |
|
|
Call Number |
CGA2013 |
Serial |
2222 |
|
Permanent link to this record |
|
|
|
|
Author |
David Vazquez; Antonio Lopez; Daniel Ponsa; David Geronimo |
|
|
Title |
Interactive Training of Human Detectors |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Multiodal Interaction in Image and Video Applications |
Abbreviated Journal |
|
|
|
Volume |
48 |
Issue |
|
Pages |
169-182 |
|
|
Keywords |
Pedestrian Detection; Virtual World; AdaBoost; Domain Adaptation |
|
|
Abstract |
Image based human detection remains as a challenging problem. Most promising detectors rely on classifiers trained with labelled samples. However, labelling is a manual labor intensive step. To overcome this problem we propose to collect images of pedestrians from a virtual city, i.e., with automatic labels, and train a pedestrian detector with them, which works fine when such virtual-world data are similar to testing one, i.e., real-world pedestrians in urban areas. When testing data is acquired in different conditions than training one, e.g., human detection in personal photo albums, dataset shift appears. In previous work, we cast this problem as one of domain adaptation and solve it with an active learning procedure. In this work, we focus on the same problem but evaluating a different set of faster to compute features, i.e., Haar, EOH and their combination. In particular, we train a classifier with virtual-world data, using such features and Real AdaBoost as learning machine. This classifier is applied to real-world training images. Then, a human oracle interactively corrects the wrong detections, i.e., few miss detections are manually annotated and some false ones are pointed out too. A low amount of manual annotation is fixed as restriction. Real- and virtual-world difficult samples are combined within what we call cool world and we retrain the classifier with this data. Our experiments show that this adapted classifier is equivalent to the one trained with only real-world data but requiring 90% less manual annotations. |
|
|
Address |
Springer Heidelberg New York Dordrecht London |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
English |
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS; 600.057; 600.054; 605.203 |
Approved |
no |
|
|
Call Number |
VLP2013; ADAS @ adas @ vlp2013 |
Serial |
2193 |
|
Permanent link to this record |
|
|
|
|
Author |
Miquel Ferrer; I. Bardaji; Ernest Valveny; Dimosthenis Karatzas; Horst Bunke |
|
|
Title |
Median Graph Computation by Means of Graph Embedding into Vector Spaces |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Graph Embedding for Pattern Analysis |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
45-72 |
|
|
Keywords |
|
|
|
Abstract |
In pattern recognition [8, 14], a key issue to be addressed when designing a system is how to represent input patterns. Feature vectors is a common option. That is, a set of numerical features describing relevant properties of the pattern are computed and arranged in a vector form. The main advantages of this kind of representation are computational simplicity and a well sound mathematical foundation. Thus, a large number of operations are available to work with vectors and a large repository of algorithms for pattern analysis and classification exist. However, the simple structure of feature vectors might not be the best option for complex patterns where nonnumerical features or relations between different parts of the pattern become relevant. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer New York |
Place of Publication |
|
Editor |
Yun Fu; Yungian Ma |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4614-4456-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ FBV2013 |
Serial |
2421 |
|
Permanent link to this record |
|
|
|
|
Author |
Jorge Bernal; David Vazquez (eds) |
|
|
Title |
Computer vision Trends and Challenges |
Type |
Book Whole |
|
Year |
2013 |
Publication |
Computer vision Trends and Challenges |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
CVCRD; Computer Vision |
|
|
Abstract |
This book contains the papers presented at the Eighth CVC Workshop on Computer Vision Trends and Challenges (CVCR&D'2013). The workshop was held at the Computer Vision Center (Universitat Autònoma de Barcelona), the October 25th, 2013. The CVC workshops provide an excellent opportunity for young researchers and project engineers to share new ideas and knowledge about the progress of their work, and also, to discuss about challenges and future perspectives. In addition, the workshop is the welcome event for new people that recently have joined the institute.
The program of CVCR&D is organized in a single-track single-day workshop. It comprises several sessions dedicated to specific topics. For each session, a doctor working on the topic introduces the general research lines. The PhD students expose their specific research. A poster session will be held for open questions. Session topics cover the current research lines and development projects of the CVC: Medical Imaging, Medical Imaging, Color & Texture Analysis, Object Recognition, Image Sequence Evaluation, Advanced Driver Assistance Systems, Machine Vision, Document Analysis, Pattern Recognition and Applications. We want to thank all paper authors and Program Committee members. Their contribution shows that the CVC has a dynamic, active, and promising scientific community.
We hope you all enjoy this Eighth workshop and we are looking forward to meeting you and new people next year in the Ninth CVCR&D. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
Jorge Bernal; David Vazquez |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-940902-2-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
|
Approved |
no |
|
|
Call Number |
ADAS @ adas @ BeV2013 |
Serial |
2339 |
|
Permanent link to this record |
|
|
|
|
Author |
Muhammad Anwer Rao |
|
|
Title |
Color for Object Detection and Action Recognition |
Type |
Book Whole |
|
Year |
2013 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Recognizing object categories in real world images is a challenging problem in computer vision. The deformable part based framework is currently the most successful approach for object detection. Generally, HOG are used for image representation within the part-based framework. For action recognition, the bag-of-word framework has shown to provide promising results. Within the bag-of-words framework, local image patches are described by SIFT descriptor. Contrary to object detection and action recognition, combining color and shape has shown to provide the best performance for object and scene recognition.
In the first part of this thesis, we analyze the problem of person detection in still images. Standard person detection approaches rely on intensity based features for image representation while ignoring the color. Channel based descriptors is one of the most commonly used approaches in object recognition. This inspires us to evaluate incorporating color information using the channel based fusion approach for the task of person detection.
In the second part of the thesis, we investigate the problem of object detection in still images. Due to high dimensionality, channel based fusion increases the computational cost. Moreover, channel based fusion has been found to obtain inferior results for object category where one of the visual varies significantly. On the other hand, late fusion is known to provide improved results for a wide range of object categories. A consequence of late fusion strategy is the need of a pure color descriptor. Therefore, we propose to use Color attributes as an explicit color representation for object detection. Color attributes are compact and computationally efficient. Consequently color attributes are combined with traditional shape features providing excellent results for object detection task.
Finally, we focus on the problem of action detection and classification in still images. We investigate the potential of color for action classification and detection in still images. We also evaluate different fusion approaches for combining color and shape information for action recognition. Additionally, an analysis is performed to validate the contribution of color for action recognition. Our results clearly demonstrate that combining color and shape information significantly improve the performance of both action classification and detection in still images. |
|
|
Address |
Barcelona |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Antonio Lopez;Joost Van de Weijer |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
Admin @ si @ Rao2013 |
Serial |
2281 |
|
Permanent link to this record |
|
|
|
|
Author |
Javier Marin |
|
|
Title |
Pedestrian Detection Based on Local Experts |
Type |
Book Whole |
|
Year |
2013 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
During the last decade vision-based human detection systems have started to play a key rolein multiple applications linked to driver assistance, surveillance, robot sensing and home automation.
Detecting humans is by far one of the most challenging tasks in Computer Vision.
This is mainly due to the high degree of variability in the human appearanceassociated to
the clothing, pose, shape and size. Besides, other factors such as cluttered scenarios, partial occlusions, or environmental conditions can make the detection task even harder.
Most promising methods of the state-of-the-art rely on discriminative learning paradigms which are fed with positive and negative examples. The training data is one of the most
relevant elements in order to build a robust detector as it has to cope the large variability of the target. In order to create this dataset human supervision is required. The drawback at this point is the arduous effort of annotating as well as looking for such claimed variability.
In this PhD thesis we address two recurrent problems in the literature. In the first stage,we aim to reduce the consuming task of annotating, namely, by using computer graphics.
More concretely, we develop a virtual urban scenario for later generating a pedestrian dataset.
Then, we train a detector using this dataset, and finally we assess if this detector can be successfully applied in a real scenario.
In the second stage, we focus on increasing the robustness of our pedestrian detectors
under partial occlusions. In particular, we present a novel occlusion handling approach to increase the performance of block-based holistic methods under partial occlusions. For this purpose, we make use of local experts via a RandomSubspaceMethod (RSM) to handle these cases. If the method infers a possible partial occlusion, then the RSM, based on performance statistics obtained from partially occluded data, is applied. The last objective of this thesis
is to propose a robust pedestrian detector based on an ensemble of local experts. To achieve this goal, we use the random forest paradigm, where the trees act as ensembles an their nodesare the local experts. In particular, each expert focus on performing a robust classification ofa pedestrian body patch. This approach offers computational efficiency and far less design complexity when compared to other state-of-the-artmethods, while reaching better accuracy |
|
|
Address |
Barcelona |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Antonio Lopez;Jaume Amores |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
Admin @ si @ Mar2013 |
Serial |
2280 |
|
Permanent link to this record |
|
|
|
|
Author |
Wenjuan Gong |
|
|
Title |
3D Motion Data aided Human Action Recognition and Pose Estimation |
Type |
Book Whole |
|
Year |
2013 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
In this work, we explore human action recognition and pose estimation prob-
lems. Different from traditional works of learning from 2D images or video
sequences and their annotated output, we seek to solve the problems with ad-
ditional 3D motion capture information, which helps to fill the gap between 2D
image features and human interpretations.
We first compare two different schools of approaches commonly used for 3D
pose estimation from 2D pose configuration: modeling and learning methods.
By looking into experiments results and considering our problems, we fixed a
learning method as the following approaches to do pose estimation. We then
establish a framework by adding a module of detecting 2D pose configuration
from images with varied background, which widely extend the application of
the approach. We also seek to directly estimate 3D poses from image features,
instead of estimating 2D poses as a intermediate module. We explore a robust
input feature, which combined with the proposed distance measure, provides
a solution for noisy or corrupted inputs. We further utilize the above method
to estimate weak poses,which is a concise representation of the original poses
by using dimension deduction technologies, from image features. Weak pose
space is where we calculate vocabulary and label action types using a bog of
words pipeline. Temporal information of an action is taken into consideration by
considering several consecutive frames as a single unit for computing vocabulary
and histogram assignments. |
|
|
Address |
Barcelona |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Jordi Gonzalez;Xavier Roca |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE |
Approved |
no |
|
|
Call Number |
Admin @ si @ Gon2013 |
Serial |
2279 |
|
Permanent link to this record |
|
|
|
|
Author |
Murad Al Haj |
|
|
Title |
Looking at Faces: Detection, Tracking and Pose Estimation |
Type |
Book Whole |
|
Year |
2013 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Humans can effortlessly perceive faces, follow them over space and time, and decode their rich content, such as pose, identity and expression. However, despite many decades of research on automatic facial perception in areas like face detection, expression recognition, pose estimation and face recognition, and despite many successes, a complete solution remains elusive. This thesis is dedicated to three problems in automatic face perception, namely face detection, face tracking and pose estimation.
In face detection, an initial simple model is presented that uses pixel-based heuristics to segment skin locations and hand-crafted rules to determine the locations of the faces present in an image. Different colorspaces are studied to judge whether a colorspace transformation can aid skin color detection. The output of this study is used in the design of a more complex face detector that is able to successfully generalize to different scenarios.
In face tracking, a framework that combines estimation and control in a joint scheme is presented to track a face with a single pan-tilt-zoom camera. While this work is mainly motivated by tracking faces, it can be easily applied atop of any detector to track different objects. The applicability of this method is demonstrated on simulated as well as real-life scenarios.
The last and most important part of this thesis is dedicate to monocular head pose estimation. In this part, a method based on partial least squares (PLS) regression is proposed to estimate pose and solve the alignment problem simultaneously. The contributions of this work are two-fold: 1) demonstrating that the proposed method achieves better than state-of-the-art results on the estimation problem and 2) developing a technique to reduce misalignment based on the learned PLS factors that outperform multiple instance learning (MIL) without the need for any re-training or the inclusion of misaligned samples in the training process, as normally done in MIL. |
|
|
Address |
Barcelona |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Jordi Gonzalez;Xavier Roca |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE |
Approved |
no |
|
|
Call Number |
Admin @ si @ Haj2013 |
Serial |
2278 |
|
Permanent link to this record |