Publicacions CVC -- Query Results

[141–150] << 151 152 153 154 155 156 157 158 159 160 >> [161–170]

Details

Records
Author	Fahad Shahbaz Khan; Joost Van de Weijer; Muhammad Anwer Rao; Michael Felsberg; Carlo Gatta
Title	Semantic Pyramids for Gender and Action Recognition			Type	Journal Article
Year	2014	Publication	IEEE Transactions on Image Processing	Abbreviated Journal	TIP
Volume	23	Issue	8	Pages	3633-3645
Keywords
Abstract	Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1057-7149	ISBN		Medium
Area		Expedition		Conference
Notes	CIC; LAMP; 601.160; 600.074; 600.079;MILAB			Approved	no
Call Number	Admin @ si @ KWR2014			Serial	2507
Permanent link to this record



Author	Fahad Shahbaz Khan; Joost Van de Weijer; Muhammad Anwer Rao; Andrew Bagdanov; Michael Felsberg; Jorma
Title	Scale coding bag of deep features for human attribute and action recognition			Type	Journal Article
Year	2018	Publication	Machine Vision and Applications	Abbreviated Journal	MVAP
Volume	29	Issue	1	Pages	55-71
Keywords	Action recognition; Attribute recognition; Bag of deep features
Abstract	Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a bag of deep features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state of the art.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.068; 600.079; 600.106; 600.120			Approved	no
Call Number	Admin @ si @ KWR2018			Serial	3107
Permanent link to this record



Author	Fahad Shahbaz Khan; Joost Van de Weijer; Maria Vanrell
Title	Top-Down Color Attention for Object Recognition			Type	Conference Article
Year	2009	Publication	12th International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	979 - 986
Keywords
Abstract	Generally the bag-of-words based image representation follows a bottom-up paradigm. The subsequent stages of the process: feature detection, feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, combining multiple cues such as shape and color often provides below-expected results. This paper presents a novel method for recognizing object categories when using multiple cues by separating the shape and color cue. Color is used to guide attention by means of a top-down category-specific attention map. The color attention map is then further deployed to modulate the shape features by taking more features from regions within an image that are likely to contain an object instance. This procedure leads to a category-specific image histogram representation for each category. Furthermore, we argue that the method combines the advantages of both early and late fusion. We compare our approach with existing methods that combine color and shape cues on three data sets containing varied importance of both cues, namely, Soccer ( color predominance), Flower (color and shape parity), and PASCAL VOC Challenge 2007 (shape predominance). The experiments clearly demonstrate that in all three data sets our proposed framework significantly outperforms the state-of-the-art methods for combining color and shape information.
Address	Kyoto, Japan
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN	978-1-4244-4420-5	Medium
Area		Expedition		Conference	ICCV
Notes	CIC			Approved	no
Call Number	CAT @ cat @ SWV2009			Serial	1196
Permanent link to this record



Author	Fahad Shahbaz Khan; Joost Van de Weijer; Maria Vanrell
Title	Who Painted this Painting?			Type	Conference Article
Year	2010	Publication	Proceedings of The CREATE 2010 Conference	Abbreviated Journal
Volume		Issue		Pages	329–333
Keywords
Abstract
Address	Gjovik (Norway)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CREATE
Notes	CIC			Approved	no
Call Number	CAT @ cat @ KWV2010			Serial	1329
Permanent link to this record



Author	Fahad Shahbaz Khan; Joost Van de Weijer; Maria Vanrell
Title	Modulating Shape Features by Color Attention for Object Recognition			Type	Journal Article
Year	2012	Publication	International Journal of Computer Vision	Abbreviated Journal	IJCV
Volume	98	Issue	1	Pages	49-64
Keywords
Abstract	Bag-of-words based image representation is a successful approach for object recognition. Generally, the subsequent stages of the process: feature detection,feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, it was found that the combination of different image cues, such as shape and color, often obtains below expected results. This paper presents a novel method for recognizing object categories when using ultiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category specific color attention. Color is used to compute bottom up and top-down attention maps. Subsequently, these color attention maps are used to modulate the weights of the shape features. In regions with higher attention shape features are given more weight than in regions with low attention. We compare our approach with existing methods that combine color and shape cues on five data sets containing varied importance of both cues, namely, Soccer (color predominance), Flower (color and hape parity), PASCAL VOC 2007 and 2009 (shape predominance) and Caltech-101 (color co-interference). The experiments clearly demonstrate that in all five data sets our proposed framework significantly outperforms existing methods for combining color and shape information.
Address
Corporate Author				Thesis
Publisher	Springer Netherlands	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0920-5691	ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ KWV2012			Serial	1864
Permanent link to this record



Author	Fahad Shahbaz Khan; Joost Van de Weijer; Andrew Bagdanov; Michael Felsberg
Title	Scale Coding Bag-of-Words for Action Recognition			Type	Conference Article
Year	2014	Publication	22nd International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	1514-1519
Keywords
Abstract	Recognizing human actions in still images is a challenging problem in computer vision due to significant amount of scale, illumination and pose variation. Given the bounding box of a person both at training and test time, the task is to classify the action associated with each bounding box in an image. Most state-of-the-art methods use the bag-of-words paradigm for action recognition. The bag-of-words framework employing a dense multi-scale grid sampling strategy is the de facto standard for feature detection. This results in a scale invariant image representation where all the features at multiple-scales are binned in a single histogram. We argue that such a scale invariant strategy is sub-optimal since it ignores the multi-scale information available with each bounding box of a person. This paper investigates alternative approaches to scale coding for action recognition in still images. We encode multi-scale information explicitly in three different histograms for small, medium and large scale visual-words. Our first approach exploits multi-scale information with respect to the image size. In our second approach, we encode multi-scale information relative to the size of the bounding box of a person instance. In each approach, the multi-scale histograms are then concatenated into a single representation for action classification. We validate our approaches on the Willow dataset which contains seven action categories: interacting with computer, photography, playing music, riding bike, riding horse, running and walking. Our results clearly suggest that the proposed scale coding approaches outperform the conventional scale invariant technique. Moreover, we show that our approach obtains promising results compared to more complex state-of-the-art methods.
Address	Stockholm; August 2014
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	CIC; LAMP; 601.240; 600.074; 600.079			Approved	no
Call Number	Admin @ si @ KWB2014			Serial	2450
Permanent link to this record



Author	Fahad Shahbaz Khan; Joost Van de Weijer; Andrew Bagdanov; Maria Vanrell
Title	Portmanteau Vocabularies for Multi-Cue Image Representation			Type	Conference Article
Year	2011	Publication	25th Annual Conference on Neural Information Processing Systems	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	We describe a novel technique for feature combination in the bag-of-words model of image classification. Our approach builds discriminative compound words from primitive cues learned independently from training images. Our main observation is that modeling joint-cue distributions independently is more statistically robust for typical classification problems than attempting to empirically estimate the dependent, joint-cue distribution directly. We use Information theoretic vocabulary compression to find discriminative combinations of cues and the resulting vocabulary of portmanteau words is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. State-of-the-art results on both the Oxford Flower-102 and Caltech-UCSD Bird-200 datasets demonstrate the effectiveness of our technique compared to other, significantly more complex approaches to multi-cue image representation
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	NIPS
Notes	CIC			Approved	no
Call Number	Admin @ si @ KWB2011			Serial	1865
Permanent link to this record



Author	Fahad Shahbaz Khan; Jiaolong Xu; Muhammad Anwer Rao; Joost Van de Weijer; Andrew Bagdanov; Antonio Lopez
Title	Recognizing Actions through Action-specific Person Detection			Type	Journal Article
Year	2015	Publication	IEEE Transactions on Image Processing	Abbreviated Journal	TIP
Volume	24	Issue	11	Pages	4422-4432
Keywords
Abstract	Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test tim- , outperforms on both data sets state-of-the-art methods, which do use person locations.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1057-7149	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; LAMP; 600.076; 600.079			Approved	no
Call Number	Admin @ si @ KXR2015			Serial	2668
Permanent link to this record



Author	Fahad Shahbaz Khan
Title	Coloring bag-of-words based image representations			Type	Book Whole
Year	2011	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Put succinctly, the bag-of-words based image representation is the most successful approach for object and scene recognition. Within the bag-of-words framework the optimal fusion of multiple cues, such as shape, texture and color, still remains an active research domain. There exist two main approaches to combine color and shape information within the bag-of-words framework. The first approach called, early fusion, fuses color and shape at the feature level as a result of which a joint colorshape vocabulary is produced. The second approach, called late fusion, concatenates histogram representation of both color and shape, obtained independently. In the first part of this thesis, we analyze the theoretical implications of both early and late feature fusion. We demonstrate that both these approaches are suboptimal for a subset of object categories. Consequently, we propose a novel method for recognizing object categories when using multiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category specific color attention. Color is used to compute bottom-up and top-down attention maps. Subsequently, the color attention maps are used to modulate the weights of the shape features. Shape features are given more weight in regions with higher attention and vice versa. The approach is tested on several benchmark object recognition data sets and the results clearly demonstrate the effectiveness of our proposed method. In the second part of the thesis, we investigate the problem of obtaining compact spatial pyramid representations for object and scene recognition. Spatial pyramids have been successfully applied to incorporate spatial information into bag-of-words based image representation. However, a major drawback of spatial pyramids is that it leads to high dimensional image representations. We present a novel framework for obtaining compact pyramid representation. The approach reduces the size of a high dimensional pyramid representation upto an order of magnitude without any significant reduction in accuracy. Moreover, we also investigate the optimal combination of multiple features such as color and shape within the context of our compact pyramid representation. Finally, we describe a novel technique to build discriminative visual words from multiple cues learned independently from training images. To this end, we use an information theoretic vocabulary compression technique to find discriminative combinations of visual cues and the resulting visual vocabulary is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. The approach is tested on standard object recognition data sets. The results obtained clearly demonstrate the effectiveness of our approach.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher		Place of Publication		Editor	Joost Van de Weijer;Maria Vanrell
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ Kha2011			Serial	1838
Permanent link to this record



Author	Fadi Dornaika; Jose Manuel Alvarez; Angel Sappa; Antonio Lopez
Title	A New Framework for Stereo Sensor Pose through Road Segmentation and Registration			Type	Journal Article
Year	2011	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
Volume	12	Issue	4	Pages	954-966
Keywords	road detection
Abstract	This paper proposes a new framework for real-time estimation of the onboard stereo head's position and orientation relative to the road surface, which is required for any advanced driver-assistance application. This framework can be used with all road types: highways, urban, etc. Unlike existing works that rely on feature extraction in either the image domain or 3-D space, we propose a framework that directly estimates the unknown parameters from the stream of stereo pairs' brightness. The proposed approach consists of two stages that are invoked for every stereo frame. The first stage segments the road region in one monocular view. The second stage estimates the camera pose using a featureless registration between the segmented monocular road region and the other view in the stereo pair. This paper has two main contributions. The first contribution combines a road segmentation algorithm with a registration technique to estimate the online stereo camera pose. The second contribution solves the registration using a featureless method, which is carried out using two different optimization techniques: 1) the differential evolution algorithm and 2) the Levenberg-Marquardt (LM) algorithm. We provide experiments and evaluations of performance. The results presented show the validity of our proposed framework.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1524-9050	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	Admin @ si @ DAS2011; ADAS @ adas @ das2011a			Serial	1833
Permanent link to this record



Author	Fadi Dornaika; J. Ahlberg
Title	Fitting 3D face models for tracking and active appearance model training			Type	Journal
Year	2006	Publication	Image and Vision Computing, 24(9): 1010–1024	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ DoA2006			Serial	733
Permanent link to this record



Author	Fadi Dornaika; Franck Davoine
Title	Simultaneous Facial Action Tracking and Expression Recognition using a Particle Filter			Type	Miscellaneous
Year	2005	Publication	10th IEEE Int. Conference on Computer Vision (ICCV)	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Beijing (China)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ DoD2005d			Serial	581
Permanent link to this record



Author	Fadi Dornaika; Franck Davoine
Title	Facial expression recognition in continuous videos using dynamic programming			Type	Miscellaneous
Year	2005	Publication	IEEE Int. Conference on Image Processing	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Genova (Italy)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ DoD2005a			Serial	597
Permanent link to this record



Author	Fadi Dornaika; Franck Davoine
Title	SFM for planar scenes using image derivatives			Type	Miscellaneous
Year	2005	Publication	IEEE Int. Conference on Image Processing, 1088–1091	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Genova (Italy)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ DoD2005b			Serial	598
Permanent link to this record



Author	Fadi Dornaika; Franck Davoine
Title	On appearance based face and facial action tracking			Type	Journal
Year	2006	Publication	IEEE Transactions on Circuits and Systems for Video Technology, 16(9): 1838–1853	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ DoD2006b			Serial	732
Permanent link to this record