|
Marco Pedersoli, Jordi Gonzalez, Andrew Bagdanov, & Xavier Roca. (2011). Efficient Discriminative Multiresolution Cascade for Real-Time Human Detection Applications. PRL - Pattern Recognition Letters, 32(13), 1581–1587.
Abstract: Human detection is fundamental in many machine vision applications, like video surveillance, driving assistance, action recognition and scene understanding. However in most of these applications real-time performance is necessary and this is not achieved yet by current detection methods.
This paper presents a new method for human detection based on a multiresolution cascade of Histograms of Oriented Gradients (HOG) that can highly reduce the computational cost of detection search without affecting accuracy. The method consists of a cascade of sliding window detectors. Each detector is a linear Support Vector Machine (SVM) composed of HOG features at different resolutions, from coarse at the first level to fine at the last one.
In contrast to previous methods, our approach uses a non-uniform stride of the sliding window that is defined by the feature resolution and allows the detection to be incrementally refined as going from coarse-to-fine resolution. In this way, the speed-up of the cascade is not only due to the fewer number of features computed at the first levels of the cascade, but also to the reduced number of windows that need to be evaluated at the coarse resolution. Experimental results show that our method reaches a detection rate comparable with the state-of-the-art of detectors based on HOG features, while at the same time the detection search is up to 23 times faster.
|
|
|
Ariel Amato, Mikhail Mozerov, Andrew Bagdanov, & Jordi Gonzalez. (2011). Accurate Moving Cast Shadow Suppression Based on Local Color Constancy detection. TIP - IEEE Transactions on Image Processing, 20(10), 2954–2966.
Abstract: This paper describes a novel framework for detection and suppression of properly shadowed regions for most possible scenarios occurring in real video sequences. Our approach requires no prior knowledge about the scene, nor is it restricted to specific scene structures. Furthermore, the technique can detect both achromatic and chromatic shadows even in the presence of camouflage that occurs when foreground regions are very similar in color to shadowed regions. The method exploits local color constancy properties due to reflectance suppression over shadowed regions. To detect shadowed regions in a scene, the values of the background image are divided by values of the current frame in the RGB color space. We show how this luminance ratio can be used to identify segments with low gradient constancy, which in turn distinguish shadows from foreground. Experimental results on a collection of publicly available datasets illustrate the superior performance of our method compared with the most sophisticated, state-of-the-art shadow detection algorithms. These results show that our approach is robust and accurate over a broad range of shadow types and challenging video conditions.
|
|
|
Arjan Gijsenij, Theo Gevers, & Joost Van de Weijer. (2011). Computational Color Constancy: Survey and Experiments. TIP - IEEE Transactions on Image Processing, 20(9), 2475–2489.
Abstract: Computational color constancy is a fundamental prerequisite for many computer vision applications. This paper presents a survey of many recent developments and state-of-the- art methods. Several criteria are proposed that are used to assess the approaches. A taxonomy of existing algorithms is proposed and methods are separated in three groups: static methods, gamut-based methods and learning-based methods. Further, the experimental setup is discussed including an overview of publicly available data sets. Finally, various freely available methods, of which some are considered to be state-of-the-art, are evaluated on two data sets.
Keywords: computational color constancy;computer vision application;gamut-based method;learning-based method;static method;colour vision;computer vision;image colour analysis;learning (artificial intelligence);lighting
|
|
|
Carles Fernandez, Pau Baiget, Xavier Roca, & Jordi Gonzalez. (2011). Determining the Best Suited Semantic Events for Cognitive Surveillance. EXSY - Expert Systems with Applications, 38(4), 4068–4079.
Abstract: State-of-the-art systems on cognitive surveillance identify and describe complex events in selected domains, thus providing end-users with tools to easily access the contents of massive video footage. Nevertheless, as the complexity of events increases in semantics and the types of indoor/outdoor scenarios diversify, it becomes difficult to assess which events describe better the scene, and how to model them at a pixel level to fulfill natural language requests. We present an ontology-based methodology that guides the identification, step-by-step modeling, and generalization of the most relevant events to a specific domain. Our approach considers three steps: (1) end-users provide textual evidence from surveilled video sequences; (2) transcriptions are analyzed top-down to build the knowledge bases for event description; and (3) the obtained models are used to generalize event detection to different image sequences from the surveillance domain. This framework produces user-oriented knowledge that improves on existing advanced interfaces for video indexing and retrieval, by determining the best suited events for video understanding according to end-users. We have conducted experiments with outdoor and indoor scenes showing thefts, chases, and vandalism, demonstrating the feasibility and generalization of this proposal.
Keywords: Cognitive surveillance; Event modeling; Content-based video retrieval; Ontologies; Advanced user interfaces
|
|
|
Carles Fernandez, Pau Baiget, Xavier Roca, & Jordi Gonzalez. (2011). Augmenting Video Surveillance Footage with Virtual Agents for Incremental Event Evaluation. PRL - Pattern Recognition Letters, 32(6), 878–889.
Abstract: The fields of segmentation, tracking and behavior analysis demand for challenging video resources to test, in a scalable manner, complex scenarios like crowded environments or scenes with high semantics. Nevertheless, existing public databases cannot scale the presence of appearing agents, which would be useful to study long-term occlusions and crowds. Moreover, creating these resources is expensive and often too particularized to specific needs. We propose an augmented reality framework to increase the complexity of image sequences in terms of occlusions and crowds, in a scalable and controllable manner. Existing datasets can be increased with augmented sequences containing virtual agents. Such sequences are automatically annotated, thus facilitating evaluation in terms of segmentation, tracking, and behavior recognition. In order to easily specify the desired contents, we propose a natural language interface to convert input sentences into virtual agent behaviors. Experimental tests and validation in indoor, street, and soccer environments are provided to show the feasibility of the proposed approach in terms of robustness, scalability, and semantics.
|
|