Publicacions CVC -- Query Results

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

Details

Records
Author	T. Mouats; N. Aouf; Angel Sappa; Cristhian A. Aguilera-Carrasco; Ricardo Toledo
Title	Multi-Spectral Stereo Odometry			Type	Journal Article
Year	2015	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
Volume	16	Issue	3	Pages	1210-1224
Keywords	Egomotion estimation; feature matching; multispectral odometry (MO); optical flow; stereo odometry; thermal imagery
Abstract	In this paper, we investigate the problem of visual odometry for ground vehicles based on the simultaneous utilization of multispectral cameras. It encompasses a stereo rig composed of an optical (visible) and thermal sensors. The novelty resides in the localization of the cameras as a stereo setup rather than two monocular cameras of different spectrums. To the best of our knowledge, this is the first time such task is attempted. Log-Gabor wavelets at different orientations and scales are used to extract interest points from both images. These are then described using a combination of frequency and spatial information within the local neighborhood. Matches between the pairs of multimodal images are computed using the cosine similarity function based on the descriptors. Pyramidal Lucas–Kanade tracker is also introduced to tackle temporal feature matching within challenging sequences of the data sets. The vehicle egomotion is computed from the triangulated 3-D points corresponding to the matched features. A windowed version of bundle adjustment incorporating Gauss–Newton optimization is utilized for motion estimation. An outlier removal scheme is also included within the framework to deal with outliers. Multispectral data sets were generated and used as test bed. They correspond to real outdoor scenarios captured using our multimodal setup. Finally, detailed results validating the proposed strategy are illustrated.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1524-9050	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.055; 600.076			Approved	no
Call Number	Admin @ si @ MAS2015a			Serial	2533
Permanent link to this record



Author	T. Alejandra Vidal; Andrew J. Davison; Juan Andrade; David W. Murray
Title	Active Control for Single Camera SLAM			Type	Miscellaneous
Year	2006	Publication	IEEE International Conference on Robotics and Automation, 1930–1936	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Orlando (Florida)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	DAG @ dag @ VDA2006			Serial	666
Permanent link to this record



Author	T. Alejandra Vidal; A. Sanfeliu; Juan Andrade
Title	Autonomous Single Camera Exploration			Type	Miscellaneous
Year	2006	Publication	Jornada de Recerca en Automatica, Visio i Robotica	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Barcelona (Spain)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ VSA2006c			Serial	680
Permanent link to this record



Author	Swathikiran Sudhakaran; Sergio Escalera;Oswald Lanz
Title	Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries			Type	Journal Article
Year	2021	Publication	IEEE Transactions on Pattern Analysis and Machine Intelligence	Abbreviated Journal	TPAMI
Volume		Issue		Pages
Keywords
Abstract	We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ SEL2021			Serial	3656
Permanent link to this record



Author	Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz
Title	LSTA: Long Short-Term Attention for Egocentric Action Recognition			Type	Conference Article
Year	2019	Publication	32nd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	9946-9955
Keywords
Abstract	Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state-of-the-art performance on four standard benchmarks.
Address	California; June 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPR
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ SEL2019			Serial	3333
Permanent link to this record



Author	Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz
Title	Gate-Shift Networks for Video Action Recognition			Type	Conference Article
Year	2020	Publication	33rd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity.
Address	Virtual CVPR
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPR
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ SEL2020			Serial	3438
Permanent link to this record



Author	Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz
Title	Gate-Shift-Fuse for Video Action Recognition			Type	Journal Article
Year	2023	Publication	IEEE Transactions on Pattern Analysis and Machine Intelligence	Abbreviated Journal	TPAMI
Volume	45	Issue	9	Pages	10913-10928
Keywords	Action Recognition; Video Classification; Spatial Gating; Channel Fusion
Abstract	Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.
Address	1 Sept. 2023
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no menciona			Approved	no
Call Number	Admin @ si @ SEL2023			Serial	3814
Permanent link to this record



Author	Svebor Karaman; Giuseppe Lisanti; Andrew Bagdanov; Alberto del Bimbo
Title	From re-identification to identity inference: Labeling consistency by local similarity constraints			Type	Book Chapter
Year	2014	Publication	Person Re-Identification	Abbreviated Journal
Volume	2	Issue		Pages	287-307
Keywords	re-identification; Identity inference; Conditional random fields; Video surveillance
Abstract	In this chapter, we introduce the problem of identity inference as a generalization of person re-identification. It is most appropriate to distinguish identity inference from re-identification in situations where a large number of observations must be identified without knowing a priori that groups of test images represent the same individual. The standard single- and multishot person re-identification common in the literature are special cases of our formulation. We present an approach to solving identity inference by modeling it as a labeling problem in a Conditional Random Field (CRF). The CRF model ensures that the final labeling gives similar labels to detections that are similar in feature space. Experimental results are given on the ETHZ, i-LIDS and CAVIAR datasets. Our approach yields state-of-the-art performance for multishot re-identification, and our results on the more general identity inference problem demonstrate that we are able to infer the identity of very many examples even with very few labeled images in the gallery.
Address
Corporate Author				Thesis
Publisher	Springer London	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	2191-6586	ISBN	978-1-4471-6295-7	Medium
Area		Expedition		Conference
Notes	LAMP; 600.079			Approved	no
Call Number	Admin @ si @KLB2014b			Serial	2521
Permanent link to this record



Author	Svebor Karaman; Giuseppe Lisanti; Andrew Bagdanov; Alberto del Bimbo
Title	Leveraging local neighborhood topology for large scale person re-identification			Type	Journal Article
Year	2014	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	47	Issue	12	Pages	3767–3778
Keywords	Re-identification; Conditional random field; Semi-supervised; ETHZ; CAVIAR; 3DPeS; CMV100
Abstract	In this paper we describe a semi-supervised approach to person re-identification that combines discriminative models of person identity with a Conditional Random Field (CRF) to exploit the local manifold approximation induced by the nearest neighbor graph in feature space. The linear discriminative models learned on few gallery images provides coarse separation of probe images into identities, while a graph topology defined by distances between all person images in feature space leverages local support for label propagation in the CRF. We evaluate our approach using multiple scenarios on several publicly available datasets, where the number of identities varies from 28 to 191 and the number of images ranges between 1003 and 36 171. We demonstrate that the discriminative model and the CRF are complementary and that the combination of both leads to significant improvement over state-of-the-art approaches. We further demonstrate how the performance of our approach improves with increasing test data and also with increasing amounts of additional unlabeled data.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 601.240; 600.079			Approved	no
Call Number	Admin @ si @ KLB2014a			Serial	2522
Permanent link to this record



Author	Svebor Karaman; Andrew Bagdanov; Lea Landucci; Gianpaolo D'Amico; Andrea Ferracani; Daniele Pezzatini; Alberto del Bimbo
Title	Personalized multimedia content delivery on an interactive table by passive observation of museum visitors			Type	Journal Article
Year	2016	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
Volume	75	Issue	7	Pages	3787-3811
Keywords	Computer vision; Video surveillance; Cultural heritage; Multimedia museum; Personalization; Natural interaction; Passive profiling
Abstract	The amount of multimedia data collected in museum databases is growing fast, while the capacity of museums to display information to visitors is acutely limited by physical space. Museums must seek the perfect balance of information given on individual pieces in order to provide sufficient information to aid visitor understanding while maintaining sparse usage of the walls and guaranteeing high appreciation of the exhibit. Moreover, museums often target the interests of average visitors instead of the entire spectrum of different interests each individual visitor might have. Finally, visiting a museum should not be an experience contained in the physical space of the museum but a door opened onto a broader context of related artworks, authors, artistic trends, etc. In this paper we describe the MNEMOSYNE system that attempts to address these issues through a new multimedia museum experience. Based on passive observation, the system builds a profile of the artworks of interest for each visitor. These profiles of interest are then used to drive an interactive table that personalizes multimedia content delivery. The natural user interface on the interactive table uses the visitor’s profile, an ontology of museum content and a recommendation system to personalize exploration of multimedia content. At the end of their visit, the visitor can take home a personalized summary of their visit on a custom mobile application. In this article we describe in detail each component of our approach as well as the first field trials of our prototype system built and deployed at our permanent exhibition space at LeMurate (http://www.lemurate.comune.fi.it/lemurate/) in Florence together with the first results of the evaluation process during the official installation in the National Museum of Bargello (http://www.uffizi.firenze.it/musei/?m=bargello).
Address
Corporate Author				Thesis
Publisher	Springer US	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1380-7501	ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 601.240; 600.079			Approved	no
Call Number	Admin @ si @ KBL2016			Serial	2520
Permanent link to this record



Author	Susana Alvarez; Xavier Otazu; Maria Vanrell
Title	Image Segmentation Based on Inter-Feature Distance Maps			Type	Book Chapter
Year	2005	Publication	Frontiers in Artificial Intelligence and Applications, IOS Press, 131: 75–82	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	CAT @ cat @ AOV2005			Serial	569
Permanent link to this record



Author	Susana Alvarez; Maria Vanrell
Title	Texton theory revisited: a bag-of-words approach to combine textons			Type	Journal Article
Year	2012	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	45	Issue	12	Pages	4312-4325
Keywords
Abstract	The aim of this paper is to revisit an old theory of texture perception and update its computational implementation by extending it to colour. With this in mind we try to capture the optimality of perceptual systems. This is achieved in the proposed approach by sharing well-known early stages of the visual processes and extracting low-dimensional features that perfectly encode adequate properties for a large variety of textures without needing further learning stages. We propose several descriptors in a bag-of-words framework that are derived from different quantisation models on to the feature spaces. Our perceptual features are directly given by the shape and colour attributes of image blobs, which are the textons. In this way we avoid learning visual words and directly build the vocabularies on these lowdimensionaltexton spaces. Main differences between proposed descriptors rely on how co-occurrence of blob attributes is represented in the vocabularies. Our approach overcomes current state-of-art in colour texture description which is proved in several experiments on large texture datasets.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0031-3203	ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ AlV2012a			Serial	2130
Permanent link to this record



Author	Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu
Title	3D Texton Spaces for color-texture retrieval			Type	Conference Article
Year	2010	Publication	7th International Conference on Image Analysis and Recognition	Abbreviated Journal
Volume	6111	Issue		Pages	354–363
Keywords
Abstract	Color and texture are visual cues of different nature, their integration in an useful visual descriptor is not an easy problem. One way to combine both features is to compute spatial texture descriptors independently on each color channel. Another way is to do the integration at the descriptor level. In this case the problem of normalizing both cues arises. In this paper we solve the latest problem by fusing color and texture through distances in texton spaces. Textons are the attributes of image blobs and they are responsible for texture discrimination as defined in Julesz’s Texton theory. We describe them in two low-dimensional and uniform spaces, namely, shape and color. The dissimilarity between color texture images is computed by combining the distances in these two spaces. Following this approach, we propose our TCD descriptor which outperforms current state of art methods in the two different approaches mentioned above, early combination with LBP and late combination with MPEG-7. This is done on an image retrieval experiment over a highly diverse texture dataset from Corel.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	A.C. Campilho and M.S. Kamel
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-642-13771-6	Medium
Area		Expedition		Conference	ICIAR
Notes	CIC			Approved	no
Call Number	CAT @ cat @ ASV2010a			Serial	1325
Permanent link to this record



Author	Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu
Title	Perceptual color texture codebooks for retrieving in highly diverse texture datasets			Type	Conference Article
Year	2010	Publication	20th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	866–869
Keywords
Abstract	Color and texture are visual cues of different nature, their integration in a useful visual descriptor is not an obvious step. One way to combine both features is to compute texture descriptors independently on each color channel. A second way is integrate the features at a descriptor level, in this case arises the problem of normalizing both cues. A significant progress in the last years in object recognition has provided the bag-of-words framework that again deals with the problem of feature combination through the definition of vocabularies of visual words. Inspired in this framework, here we present perceptual textons that will allow to fuse color and texture at the level of p-blobs, which is our feature detection step. Feature representation is based on two uniform spaces representing the attributes of the p-blobs. The low-dimensionality of these text on spaces will allow to bypass the usual problems of previous approaches. Firstly, no need for normalization between cues; and secondly, vocabularies are directly obtained from the perceptual properties of text on spaces without any learning step. Our proposal improve current state-of-art of color-texture descriptors in an image retrieval experiment over a highly diverse texture dataset from Corel.
Address	Istanbul (Turkey)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1051-4651	ISBN	978-1-4244-7542-1	Medium
Area		Expedition		Conference	ICPR
Notes	CIC			Approved	no
Call Number	CAT @ cat @ ASV2010b			Serial	1426
Permanent link to this record



Author	Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu
Title	Low-dimensional and Comprehensive Color Texture Description			Type	Journal Article
Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	116	Issue	I	Pages	54-67
Keywords
Abstract	Image retrieval can be dealt by combining standard descriptors, such as those of MPEG-7, which are deﬁned independently for each visual cue (e.g. SCD or CLD for Color, HTD for texture or EHD for edges). A common problem is to combine similarities coming from descriptors representing different concepts in different spaces. In this paper we propose a color texture description that bypasses this problem from its inherent deﬁnition. It is based on a low dimensional space with 6 perceptual axes. Texture is described in a 3D space derived from a direct implementation of the original Julesz’s Texton theory and color is described in a 3D perceptual space. This early fusion through the blob concept in these two bounded spaces avoids the problem and allows us to derive a sparse color-texture descriptor that achieves similar performance compared to MPEG-7 in image retrieval. Moreover, our descriptor presents comprehensive qualities since it can also be applied either in segmentation or browsing: (a) a dense image representation is deﬁned from the descriptor showing a reasonable performance in locating texture patterns included in complex images; and (b) a vocabulary of basic terms is derived to build an intermediate level descriptor in natural language improving browsing by bridging semantic gap
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1077-3142	ISBN		Medium
Area		Expedition		Conference
Notes	CAT;CIC			Approved	no
Call Number	Admin @ si @ ASV2012			Serial	1827
Permanent link to this record