|   | 
Details
   web
Records
Author Ariel Amato; Mikhail Mozerov; Andrew Bagdanov; Jordi Gonzalez
Title Accurate Moving Cast Shadow Suppression Based on Local Color Constancy detection Type Journal Article
Year 2011 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 20 Issue 10 Pages 2954 - 2966
Keywords
Abstract This paper describes a novel framework for detection and suppression of properly shadowed regions for most possible scenarios occurring in real video sequences. Our approach requires no prior knowledge about the scene, nor is it restricted to specific scene structures. Furthermore, the technique can detect both achromatic and chromatic shadows even in the presence of camouflage that occurs when foreground regions are very similar in color to shadowed regions. The method exploits local color constancy properties due to reflectance suppression over shadowed regions. To detect shadowed regions in a scene, the values of the background image are divided by values of the current frame in the RGB color space. We show how this luminance ratio can be used to identify segments with low gradient constancy, which in turn distinguish shadows from foreground. Experimental results on a collection of publicly available datasets illustrate the superior performance of our method compared with the most sophisticated, state-of-the-art shadow detection algorithms. These results show that our approach is robust and accurate over a broad range of shadow types and challenging video conditions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1057-7149 ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ AMB2011 Serial 1716
Permanent link to this record
 

 
Author Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez
Title Determining the Best Suited Semantic Events for Cognitive Surveillance Type Journal Article
Year 2011 Publication Expert Systems with Applications Abbreviated Journal EXSY
Volume 38 Issue 4 Pages 4068–4079
Keywords Cognitive surveillance; Event modeling; Content-based video retrieval; Ontologies; Advanced user interfaces
Abstract State-of-the-art systems on cognitive surveillance identify and describe complex events in selected domains, thus providing end-users with tools to easily access the contents of massive video footage. Nevertheless, as the complexity of events increases in semantics and the types of indoor/outdoor scenarios diversify, it becomes difficult to assess which events describe better the scene, and how to model them at a pixel level to fulfill natural language requests. We present an ontology-based methodology that guides the identification, step-by-step modeling, and generalization of the most relevant events to a specific domain. Our approach considers three steps: (1) end-users provide textual evidence from surveilled video sequences; (2) transcriptions are analyzed top-down to build the knowledge bases for event description; and (3) the obtained models are used to generalize event detection to different image sequences from the surveillance domain. This framework produces user-oriented knowledge that improves on existing advanced interfaces for video indexing and retrieval, by determining the best suited events for video understanding according to end-users. We have conducted experiments with outdoor and indoor scenes showing thefts, chases, and vandalism, demonstrating the feasibility and generalization of this proposal.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ FBR2011a Serial 1722
Permanent link to this record
 

 
Author Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez
Title Augmenting Video Surveillance Footage with Virtual Agents for Incremental Event Evaluation Type Journal Article
Year 2011 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 32 Issue 6 Pages 878–889
Keywords
Abstract The fields of segmentation, tracking and behavior analysis demand for challenging video resources to test, in a scalable manner, complex scenarios like crowded environments or scenes with high semantics. Nevertheless, existing public databases cannot scale the presence of appearing agents, which would be useful to study long-term occlusions and crowds. Moreover, creating these resources is expensive and often too particularized to specific needs. We propose an augmented reality framework to increase the complexity of image sequences in terms of occlusions and crowds, in a scalable and controllable manner. Existing datasets can be increased with augmented sequences containing virtual agents. Such sequences are automatically annotated, thus facilitating evaluation in terms of segmentation, tracking, and behavior recognition. In order to easily specify the desired contents, we propose a natural language interface to convert input sentences into virtual agent behaviors. Experimental tests and validation in indoor, street, and soccer environments are provided to show the feasibility of the proposed approach in terms of robustness, scalability, and semantics.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ FBR2011b Serial 1723
Permanent link to this record
 

 
Author Arjan Gijsenij; Theo Gevers
Title Color Constancy Using Natural Image Statistics and Scene Semantics Type Journal Article
Year 2011 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 33 Issue 4 Pages 687-698
Keywords
Abstract Existing color constancy methods are all based on specific assumptions such as the spatial and spectral characteristics of images. As a consequence, no algorithm can be considered as universal. However, with the large variety of available methods, the question is how to select the method that performs best for a specific image. To achieve selection and combining of color constancy algorithms, in this paper natural image statistics are used to identify the most important characteristics of color images. Then, based on these image characteristics, the proper color constancy algorithm (or best combination of algorithms) is selected for a specific image. To capture the image characteristics, the Weibull parameterization (e.g., grain size and contrast) is used. It is shown that the Weibull parameterization is related to the image attributes to which the used color constancy methods are sensitive. An MoG-classifier is used to learn the correlation and weighting between the Weibull-parameters and the image attributes (number of edges, amount of texture, and SNR). The output of the classifier is the selection of the best performing color constancy method for a certain image. Experimental results show a large improvement over state-of-the-art single algorithms. On a data set consisting of more than 11,000 images, an increase in color constancy performance up to 20 percent (median angular error) can be obtained compared to the best-performing single algorithm. Further, it is shown that for certain scene categories, one specific color constancy algorithm can be used instead of the classifier considering several algorithms.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0162-8828 ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ GiG2011 Serial 1724
Permanent link to this record
 

 
Author Albert Ali Salah; Theo Gevers; Nicu Sebe; Alessandro Vinciarelli
Title Computer Vision for Ambient Intelligence Type Journal Article
Year 2011 Publication Journal of Ambient Intelligence and Smart Environments Abbreviated Journal JAISE
Volume 3 Issue 3 Pages 187-191
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ SGS2011a Serial 1725
Permanent link to this record
 

 
Author Koen E.A. van de Sande; Theo Gevers; Cees G.M. Snoek
Title Empowering Visual Categorization with the GPU Type Journal Article
Year 2011 Publication IEEE Transactions on Multimedia Abbreviated Journal TMM
Volume 13 Issue 1 Pages 60-70
Keywords
Abstract Visual categorization is important to manage large collections of digital images and video, where textual meta-data is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. When optimizing a system based on the bag-of-words approach, the goal is to minimize the time it takes to process batches of images. Additionally, we also consider power usage as an evaluation metric. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to (1) keep categorization accuracy intact, (2) decompose the problem and (3) give the same numerical results. In the experiments on large scale datasets it is shown that, by using a parallel implementation on the Geforce GTX260 GPU, classifying unseen images is 4.8 times faster than a quad-core CPU version on the Core i7 920, while giving the exact same numerical results. In addition, we show how the algorithms can be generalized to other applications, such as text retrieval and video retrieval. Moreover, when the obtained speedup is used to process extra video frames in a video retrieval benchmark, the accuracy of visual categorization is improved by 29%.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ SGS2011b Serial 1729
Permanent link to this record
 

 
Author Nataliya Shapovalova; Wenjuan Gong; Marco Pedersoli; Xavier Roca; Jordi Gonzalez
Title On Importance of Interactions and Context in Human Action Recognition Type Conference Article
Year 2011 Publication 5th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal
Volume 6669 Issue Pages 58-66
Keywords
Abstract This paper is focused on the automatic recognition of human events in static images. Popular techniques use knowledge of the human pose for inferring the action, and the most recent approaches tend to combine pose information with either knowledge of the scene or of the objects with which the human interacts. Our approach makes a step forward in this direction by combining the human pose with the scene in which the human is placed, together with the spatial relationships between humans and objects. Based on standard, simple descriptors like HOG and SIFT, recognition performance is enhanced when these three types of knowledge are taken into account. Results obtained in the PASCAL 2010 Action Recognition Dataset demonstrate that our technique reaches state-of-the-art results using simple descriptors and classifiers.
Address Las Palmas de Gran Canaria. Spain
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor J. Vitria, J.M. Sanches, and M. Hernandez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-21256-7 Medium
Area Expedition Conference IbPRIA
Notes (down) ISE Approved no
Call Number Admin @ si @ SGP2011 Serial 1750
Permanent link to this record
 

 
Author Marco Pedersoli; Andrea Vedaldi; Jordi Gonzalez
Title A Coarse-to-fine Approach for fast Deformable Object Detection Type Conference Article
Year 2011 Publication IEEE conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 1353-1360
Keywords
Abstract
Address Colorado Springs; USA
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes (down) ISE Approved no
Call Number Admin @ si @ PVG2011 Serial 1764
Permanent link to this record
 

 
Author Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez
Title Selective Spatio-Temporal Interest Points Type Journal Article
Year 2012 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU
Volume 116 Issue 3 Pages 396-410
Keywords
Abstract Recent progress in the field of human action recognition points towards the use of Spatio-TemporalInterestPoints (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1077-3142 ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ CHM2012 Serial 1806
Permanent link to this record
 

 
Author Koen E.A. van de Sande; Jasper Uilings; Theo Gevers; Arnold Smeulders
Title Segmentation as Selective Search for Object Recognition Type Conference Article
Year 2011 Publication 13th IEEE International Conference on Computer Vision Abbreviated Journal
Volume Issue Pages 1879-1886
Keywords
Abstract For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by reconsidering segmentation: We propose to generate many approximate locations over few and precise object delineations because (1) an object whose location is never generated can not be recognised and (2) appearance and immediate nearby context are most effective for object recognition. Our method is class-independent and is shown to cover 96.7% of all objects in the Pascal VOC 2007 test set using only 1,536 locations per image. Our selective search enables the use of the more expensive bag-of-words method which we use to substantially improve the state-of-the-art by up to 8.5% for 8 out of 20 classes on the Pascal VOC 2010 detection challenge.
Address Barcelona
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1550-5499 ISBN 978-1-4577-1101-5 Medium
Area Expedition Conference ICCV
Notes (down) ISE Approved no
Call Number Admin @ si @ SUG2011 Serial 1780
Permanent link to this record
 

 
Author Ivan Huerta; Ariel Amato; Xavier Roca; Jordi Gonzalez
Title Exploiting Multiple Cues in Motion Segmentation Based on Background Subtraction Type Journal Article
Year 2013 Publication Neurocomputing Abbreviated Journal NEUCOM
Volume 100 Issue Pages 183–196
Keywords Motion segmentation; Shadow suppression; Colour segmentation; Edge segmentation; Ghost detection; Background subtraction
Abstract This paper presents a novel algorithm for mobile-object segmentation from static background scenes, which is both robust and accurate under most of the common problems found in motionsegmentation. In our first contribution, a case analysis of motionsegmentation errors is presented taking into account the inaccuracies associated with different cues, namely colour, edge and intensity. Our second contribution is an hybrid architecture which copes with the main issues observed in the case analysis by fusing the knowledge from the aforementioned three cues and a temporal difference algorithm. On one hand, we enhance the colour and edge models to solve not only global and local illumination changes (i.e. shadows and highlights) but also the camouflage in intensity. In addition, local information is also exploited to solve the camouflage in chroma. On the other hand, the intensity cue is applied when colour and edge cues are not available because their values are beyond the dynamic range. Additionally, temporal difference scheme is included to segment motion where those three cues cannot be reliably computed, for example in those background regions not visible during the training period. Lastly, our approach is extended for handling ghost detection. The proposed method obtains very accurate and robust motionsegmentation results in multiple indoor and outdoor scenarios, while outperforming the most-referred state-of-art approaches.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ HAR2013 Serial 1808
Permanent link to this record
 

 
Author Bhaskar Chakraborty; Andrew Bagdanov; Jordi Gonzalez; Xavier Roca
Title Human Action Recognition Using an Ensemble of Body-Part Detectors Type Journal Article
Year 2013 Publication Expert Systems Abbreviated Journal EXSY
Volume 30 Issue 2 Pages 101-114
Keywords Human action recognition;body-part detection;hidden Markov model
Abstract This paper describes an approach to human action recognition based on a probabilistic optimization model of body parts using hidden Markov model (HMM). Our method is able to distinguish between similar actions by only considering the body parts having major contribution to the actions, for example, legs for walking, jogging and running; arms for boxing, waving and clapping. We apply HMMs to model the stochastic movement of the body parts for action recognition. The HMM construction uses an ensemble of body-part detectors, followed by grouping of part detections, to perform human identification. Three example-based body-part detectors are trained to detect three components of the human body: the head, legs and arms. These detectors cope with viewpoint changes and self-occlusions through the use of ten sub-classifiers that detect body parts over a specific range of viewpoints. Each sub-classifier is a support vector machine trained on features selected for the discriminative power for each particular part/viewpoint combination. Grouping of these detections is performed using a simple geometric constraint model that yields a viewpoint-invariant human detector. We test our approach on three publicly available action datasets: the KTH dataset, Weizmann dataset and HumanEva dataset. Our results illustrate that with a simple and compact representation we can achieve robust recognition of human actions comparable to the most complex, state-of-the-art methods.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ CBG2013 Serial 1809
Permanent link to this record
 

 
Author Nataliya Shapovalova; Carles Fernandez; Xavier Roca; Jordi Gonzalez
Title Semantics of Human Behavior in Image Sequences Type Book Chapter
Year 2011 Publication Computer Analysis of Human Behavior Abbreviated Journal
Volume Issue 7 Pages 151-182
Keywords
Abstract Human behavior is contextualized and understanding the scene of an action is crucial for giving proper semantics to behavior. In this chapter we present a novel approach for scene understanding. The emphasis of this work is on the particular case of Human Event Understanding. We introduce a new taxonomy to organize the different semantic levels of the Human Event Understanding framework proposed. Such a framework particularly contributes to the scene understanding domain by (i) extracting behavioral patterns from the integrative analysis of spatial, temporal, and contextual evidence and (ii) integrative analysis of bottom-up and top-down approaches in Human Event Understanding. We will explore how the information about interactions between humans and their environment influences the performance of activity recognition, and how this can be extrapolated to the temporal domain in order to extract higher inferences from human events observed in sequences of images.
Address
Corporate Author Thesis
Publisher Springer London Place of Publication Editor Albert Ali Salah;
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-0-85729-993-2 Medium
Area Expedition Conference
Notes (down) ISE Approved no
Call Number Admin @ si @ SFR2011 Serial 1810
Permanent link to this record
 

 
Author Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez; Xavier Roca
Title A Selective Spatio-Temporal Interest Point Detector for Human Action Recognition in Complex Scenes Type Conference Article
Year 2011 Publication 13th IEEE International Conference on Computer Vision Abbreviated Journal
Volume Issue Pages 1776-1783
Keywords
Abstract Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.
Address Barcelona
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1550-5499 ISBN 978-1-4577-1101-5 Medium
Area Expedition Conference ICCV
Notes (down) ISE Approved no
Call Number Admin @ si @ CHM2011 Serial 1811
Permanent link to this record
 

 
Author Wenjuan Gong; Jürgen Brauer; Michael Arens; Jordi Gonzalez
Title Modeling vs. Learning Approaches for Monocular 3D Human Pose Estimation Type Conference Article
Year 2011 Publication 1st IEEE International Workshop on Performance Evaluation on Recognition of Human Actions and Pose Estimation Methods Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address London, United Kingdom
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference PERHAPS
Notes (down) ISE Approved no
Call Number Admin @ si @ GBA2011 Serial 1812
Permanent link to this record