|
Records |
Links |
|
Author |
T. Alejandra Vidal; Andrew J. Davison; Juan Andrade; David W. Murray |
|
|
Title |
Active Control for Single Camera SLAM |
Type |
Miscellaneous |
|
Year |
2006 |
Publication |
IEEE International Conference on Robotics and Automation, 1930–1936 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Orlando (Florida) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
|
Approved |
no |
|
|
Call Number |
DAG @ dag @ VDA2006 |
Serial |
666 |
|
Permanent link to this record |
|
|
|
|
Author |
T. Alejandra Vidal; A. Sanfeliu; Juan Andrade |
|
|
Title |
Autonomous Single Camera Exploration |
Type |
Miscellaneous |
|
Year |
2006 |
Publication |
Jornada de Recerca en Automatica, Visio i Robotica |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Barcelona (Spain) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
|
Approved |
no |
|
|
Call Number |
Admin @ si @ VSA2006c |
Serial |
680 |
|
Permanent link to this record |
|
|
|
|
Author |
Swathikiran Sudhakaran; Sergio Escalera;Oswald Lanz |
|
|
Title |
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries |
Type |
Journal Article |
|
Year |
2021 |
Publication |
IEEE Transactions on Pattern Analysis and Machine Intelligence |
Abbreviated Journal |
TPAMI |
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ SEL2021 |
Serial |
3656 |
|
Permanent link to this record |
|
|
|
|
Author |
Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz |
|
|
Title |
LSTA: Long Short-Term Attention for Egocentric Action Recognition |
Type |
Conference Article |
|
Year |
2019 |
Publication |
32nd IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
9946-9955 |
|
|
Keywords |
|
|
|
Abstract |
Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state-of-the-art performance on four standard benchmarks. |
|
|
Address |
California; June 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPR |
|
|
Notes |
HuPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ SEL2019 |
Serial |
3333 |
|
Permanent link to this record |
|
|
|
|
Author |
Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz |
|
|
Title |
Gate-Shift Networks for Video Action Recognition |
Type |
Conference Article |
|
Year |
2020 |
Publication |
33rd IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity. |
|
|
Address |
Virtual CVPR |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPR |
|
|
Notes |
HuPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ SEL2020 |
Serial |
3438 |
|
Permanent link to this record |
|
|
|
|
Author |
Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz |
|
|
Title |
Gate-Shift-Fuse for Video Action Recognition |
Type |
Journal Article |
|
Year |
2023 |
Publication |
IEEE Transactions on Pattern Analysis and Machine Intelligence |
Abbreviated Journal |
TPAMI |
|
|
Volume |
45 |
Issue |
9 |
Pages |
10913-10928 |
|
|
Keywords |
Action Recognition; Video Classification; Spatial Gating; Channel Fusion |
|
|
Abstract |
Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks. |
|
|
Address |
1 Sept. 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ SEL2023 |
Serial |
3814 |
|
Permanent link to this record |
|
|
|
|
Author |
Svebor Karaman; Giuseppe Lisanti; Andrew Bagdanov; Alberto del Bimbo |
|
|
Title |
From re-identification to identity inference: Labeling consistency by local similarity constraints |
Type |
Book Chapter |
|
Year |
2014 |
Publication |
Person Re-Identification |
Abbreviated Journal |
|
|
|
Volume |
2 |
Issue |
|
Pages |
287-307 |
|
|
Keywords |
re-identification; Identity inference; Conditional random fields; Video surveillance |
|
|
Abstract |
In this chapter, we introduce the problem of identity inference as a generalization of person re-identification. It is most appropriate to distinguish identity inference from re-identification in situations where a large number of observations must be identified without knowing a priori that groups of test images represent the same individual. The standard single- and multishot person re-identification common in the literature are special cases of our formulation. We present an approach to solving identity inference by modeling it as a labeling problem in a Conditional Random Field (CRF). The CRF model ensures that the final labeling gives similar labels to detections that are similar in feature space. Experimental results are given on the ETHZ, i-LIDS and CAVIAR datasets. Our approach yields state-of-the-art performance for multishot re-identification, and our results on the more general identity inference problem demonstrate that we are able to infer the identity of very many examples even with very few labeled images in the gallery. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer London |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
2191-6586 |
ISBN |
978-1-4471-6295-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP; 600.079 |
Approved |
no |
|
|
Call Number |
Admin @ si @KLB2014b |
Serial |
2521 |
|
Permanent link to this record |
|
|
|
|
Author |
Svebor Karaman; Giuseppe Lisanti; Andrew Bagdanov; Alberto del Bimbo |
|
|
Title |
Leveraging local neighborhood topology for large scale person re-identification |
Type |
Journal Article |
|
Year |
2014 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
47 |
Issue |
12 |
Pages |
3767–3778 |
|
|
Keywords |
Re-identification; Conditional random field; Semi-supervised; ETHZ; CAVIAR; 3DPeS; CMV100 |
|
|
Abstract |
In this paper we describe a semi-supervised approach to person re-identification that combines discriminative models of person identity with a Conditional Random Field (CRF) to exploit the local manifold approximation induced by the nearest neighbor graph in feature space. The linear discriminative models learned on few gallery images provides coarse separation of probe images into identities, while a graph topology defined by distances between all person images in feature space leverages local support for label propagation in the CRF. We evaluate our approach using multiple scenarios on several publicly available datasets, where the number of identities varies from 28 to 191 and the number of images ranges between 1003 and 36 171. We demonstrate that the discriminative model and the CRF are complementary and that the combination of both leads to significant improvement over state-of-the-art approaches. We further demonstrate how the performance of our approach improves with increasing test data and also with increasing amounts of additional unlabeled data. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP; 601.240; 600.079 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KLB2014a |
Serial |
2522 |
|
Permanent link to this record |
|
|
|
|
Author |
Svebor Karaman; Andrew Bagdanov; Lea Landucci; Gianpaolo D'Amico; Andrea Ferracani; Daniele Pezzatini; Alberto del Bimbo |
|
|
Title |
Personalized multimedia content delivery on an interactive table by passive observation of museum visitors |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
|
|
Volume |
75 |
Issue |
7 |
Pages |
3787-3811 |
|
|
Keywords |
Computer vision; Video surveillance; Cultural heritage; Multimedia museum; Personalization; Natural interaction; Passive profiling |
|
|
Abstract |
The amount of multimedia data collected in museum databases is growing fast, while the capacity of museums to display information to visitors is acutely limited by physical space. Museums must seek the perfect balance of information given on individual pieces in order to provide sufficient information to aid visitor understanding while maintaining sparse usage of the walls and guaranteeing high appreciation of the exhibit. Moreover, museums often target the interests of average visitors instead of the entire spectrum of different interests each individual visitor might have. Finally, visiting a museum should not be an experience contained in the physical space of the museum but a door opened onto a broader context of related artworks, authors, artistic trends, etc. In this paper we describe the MNEMOSYNE system that attempts to address these issues through a new multimedia museum experience. Based on passive observation, the system builds a profile of the artworks of interest for each visitor. These profiles of interest are then used to drive an interactive table that personalizes multimedia content delivery. The natural user interface on the interactive table uses the visitor’s profile, an ontology of museum content and a recommendation system to personalize exploration of multimedia content. At the end of their visit, the visitor can take home a personalized summary of their visit on a custom mobile application. In this article we describe in detail each component of our approach as well as the first field trials of our prototype system built and deployed at our permanent exhibition space at LeMurate (http://www.lemurate.comune.fi.it/lemurate/) in Florence together with the first results of the evaluation process during the official installation in the National Museum of Bargello (http://www.uffizi.firenze.it/musei/?m=bargello). |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer US |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1380-7501 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP; 601.240; 600.079 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KBL2016 |
Serial |
2520 |
|
Permanent link to this record |
|
|
|
|
Author |
Susana Alvarez; Xavier Otazu; Maria Vanrell |
|
|
Title |
Image Segmentation Based on Inter-Feature Distance Maps |
Type |
Book Chapter |
|
Year |
2005 |
Publication |
Frontiers in Artificial Intelligence and Applications, IOS Press, 131: 75–82 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CIC |
Approved |
no |
|
|
Call Number |
CAT @ cat @ AOV2005 |
Serial |
569 |
|
Permanent link to this record |
|
|
|
|
Author |
Susana Alvarez; Maria Vanrell |
|
|
Title |
Texton theory revisited: a bag-of-words approach to combine textons |
Type |
Journal Article |
|
Year |
2012 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
45 |
Issue |
12 |
Pages |
4312-4325 |
|
|
Keywords |
|
|
|
Abstract |
The aim of this paper is to revisit an old theory of texture perception and
update its computational implementation by extending it to colour. With this in mind we try to capture the optimality of perceptual systems. This is achieved in the proposed approach by sharing well-known early stages of the visual processes and extracting low-dimensional features that perfectly encode adequate properties for a large variety of textures without needing further learning stages. We propose several descriptors in a bag-of-words framework that are derived from different quantisation models on to the feature spaces. Our perceptual features are directly given by the shape and colour attributes of image blobs, which are the textons. In this way we avoid learning visual words and directly build the vocabularies on these lowdimensionaltexton spaces. Main differences between proposed descriptors rely on how co-occurrence of blob attributes is represented in the vocabularies. Our approach overcomes current state-of-art in colour texture description which is proved in several experiments on large texture datasets. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0031-3203 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ AlV2012a |
Serial |
2130 |
|
Permanent link to this record |
|
|
|
|
Author |
Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu |
|
|
Title |
3D Texton Spaces for color-texture retrieval |
Type |
Conference Article |
|
Year |
2010 |
Publication |
7th International Conference on Image Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
6111 |
Issue |
|
Pages |
354–363 |
|
|
Keywords |
|
|
|
Abstract |
Color and texture are visual cues of different nature, their integration in an useful visual descriptor is not an easy problem. One way to combine both features is to compute spatial texture descriptors independently on each color channel. Another way is to do the integration at the descriptor level. In this case the problem of normalizing both cues arises. In this paper we solve the latest problem by fusing color and texture through distances in texton spaces. Textons are the attributes of image blobs and they are responsible for texture discrimination as defined in Julesz’s Texton theory. We describe them in two low-dimensional and uniform spaces, namely, shape and color. The dissimilarity between color texture images is computed by combining the distances in these two spaces. Following this approach, we propose our TCD descriptor which outperforms current state of art methods in the two different approaches mentioned above, early combination with LBP and late combination with MPEG-7. This is done on an image retrieval experiment over a highly diverse texture dataset from Corel. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
A.C. Campilho and M.S. Kamel |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-13771-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIAR |
|
|
Notes |
CIC |
Approved |
no |
|
|
Call Number |
CAT @ cat @ ASV2010a |
Serial |
1325 |
|
Permanent link to this record |
|
|
|
|
Author |
Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu |
|
|
Title |
Perceptual color texture codebooks for retrieving in highly diverse texture datasets |
Type |
Conference Article |
|
Year |
2010 |
Publication |
20th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
866–869 |
|
|
Keywords |
|
|
|
Abstract |
Color and texture are visual cues of different nature, their integration in a useful visual descriptor is not an obvious step. One way to combine both features is to compute texture descriptors independently on each color channel. A second way is integrate the features at a descriptor level, in this case arises the problem of normalizing both cues. A significant progress in the last years in object recognition has provided the bag-of-words framework that again deals with the problem of feature combination through the definition of vocabularies of visual words. Inspired in this framework, here we present perceptual textons that will allow to fuse color and texture at the level of p-blobs, which is our feature detection step. Feature representation is based on two uniform spaces representing the attributes of the p-blobs. The low-dimensionality of these text on spaces will allow to bypass the usual problems of previous approaches. Firstly, no need for normalization between cues; and secondly, vocabularies are directly obtained from the perceptual properties of text on spaces without any learning step. Our proposal improve current state-of-art of color-texture descriptors in an image retrieval experiment over a highly diverse texture dataset from Corel. |
|
|
Address |
Istanbul (Turkey) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1051-4651 |
ISBN |
978-1-4244-7542-1 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
CIC |
Approved |
no |
|
|
Call Number |
CAT @ cat @ ASV2010b |
Serial |
1426 |
|
Permanent link to this record |
|
|
|
|
Author |
Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu |
|
|
Title |
Low-dimensional and Comprehensive Color Texture Description |
Type |
Journal Article |
|
Year |
2012 |
Publication |
Computer Vision and Image Understanding |
Abbreviated Journal |
CVIU |
|
|
Volume |
116 |
Issue |
I |
Pages |
54-67 |
|
|
Keywords |
|
|
|
Abstract |
Image retrieval can be dealt by combining standard descriptors, such as those of MPEG-7, which are defined independently for each visual cue (e.g. SCD or CLD for Color, HTD for texture or EHD for edges).
A common problem is to combine similarities coming from descriptors representing different concepts in different spaces. In this paper we propose a color texture description that bypasses this problem from its inherent definition. It is based on a low dimensional space with 6 perceptual axes. Texture is described in a 3D space derived from a direct implementation of the original Julesz’s Texton theory and color is described in a 3D perceptual space. This early fusion through the blob concept in these two bounded spaces avoids the problem and allows us to derive a sparse color-texture descriptor that achieves similar performance compared to MPEG-7 in image retrieval. Moreover, our descriptor presents comprehensive qualities since it can also be applied either in segmentation or browsing: (a) a dense image representation is defined from the descriptor showing a reasonable performance in locating texture patterns included in complex images; and (b) a vocabulary of basic terms is derived to build an intermediate level descriptor in natural language improving browsing by bridging semantic gap |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1077-3142 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CAT;CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ ASV2012 |
Serial |
1827 |
|
Permanent link to this record |
|
|
|
|
Author |
Susana Alvarez |
|
|
Title |
Revisión de la teoría de los Textons Enfoque computacional en color |
Type |
Book Whole |
|
Year |
2012 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
El color y la textura son dos estímulos visuales importantes para la interpretación de las imágenes. La definición de descriptores computacionales que combinan estas dos características es aún un problema abierto. La dificultad se deriva esencialmente de la propia naturaleza de ambas, mientras que la textura es una propiedad de una región, el color es una propiedad de un punto.
Hasta ahora se han utilizado tres los tipos de aproximaciones para la combinación, (a) se describe la textura directamente en cada uno de los canales color, (b) se describen textura y color por separado y se combinan al final, y (c) la combinación se realiza con técnicas de aprendizaje automático. Considerando que este problema se resuelve en el sistema visual humano en niveles muy tempranos, en esta tesis se propone estudiar el problema a partir de la implementación directa de una teoría perceptual, la teoría de los textons, y explorar así su extensión a color.
Puesto que la teoría de los textons se basa en la descripción de la textura a partir de las densidades de los atributos locales, esto se adapta perfectamente al marco de trabajo de los descriptores holísticos (bag-of-words). Se han estudiado diversos descriptores basados en diferentes espacios de textons, y diferentes representaciones de las imágenes. Asimismo se ha estudiado la viabilidad de estos descriptores en una representación conceptual de nivel intermedio.
Los descriptores propuestos han demostrado ser muy eficientes en aplicaciones de recuperación y clasificación de imágenes, presentando ventajas en la generación de vocabularios. Los vocabularios se obtienen cuantificando directamente espacios de baja dimensión y la perceptualidad de estos espacios permite asociar semántica de bajo nivel a las palabras visuales. El estudio de los resultados permite concluir que si bien la aproximación holística es muy eficiente, la introducción de co-ocurrencia espacial de las propiedades de forma y color de los blobs de la imagen es un elemento clave para su combinación, hecho que no contradice las evidencias en percepción |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Maria Vanrell;Xavier Otazu |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CIC |
Approved |
no |
|
|
Call Number |
Alv2012b |
Serial |
2216 |
|
Permanent link to this record |