|
Records |
Links |
|
Author |
Adarsh Tiwari; Sanket Biswas; Josep Llados |
|
|
Title |
Can Pre-trained Language Models Help in Understanding Handwritten Symbols? |
Type |
Conference Article |
|
Year |
2023 |
Publication |
17th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
14193 |
Issue |
|
Pages |
199–211 |
|
|
Keywords |
|
|
|
Abstract |
The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets. |
|
|
Address |
San Jose; CA; USA; August 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ TBL2023 |
Serial |
3908 |
|
Permanent link to this record |
|
|
|
|
Author |
Alicia Fornes; Xavier Otazu; Josep Llados |
|
|
Title |
Show through cancellation and image enhancement by multiresolution contrast processing |
Type |
Conference Article |
|
Year |
2013 |
Publication |
12th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
200-204 |
|
|
Keywords |
|
|
|
Abstract |
Historical documents suffer from different types of degradation and noise such as background variation, uneven illumination or dark spots. In case of double-sided documents, another common problem is that the back side of the document usually interferes with the front side because of the transparency of the document or ink bleeding. This effect is called the show through phenomenon. Many methods are developed to solve these problems, and in the case of show-through, by scanning and matching both the front and back sides of the document. In contrast, our approach is designed to use only one side of the scanned document. We hypothesize that show-trough are low contrast components, while foreground components are high contrast ones. A Multiresolution Contrast (MC) decomposition is presented in order to estimate the contrast of features at different spatial scales. We cancel the show-through phenomenon by thresholding these low contrast components. This decomposition is also able to enhance the image removing shadowed areas by weighting spatial scales. Results show that the enhanced images improve the readability of the documents, allowing scholars both to recover unreadable words and to solve ambiguities. |
|
|
Address |
Washington; USA; August 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1520-5363 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 602.006; 600.045; 600.061; 600.052;CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ FOL2013 |
Serial |
2241 |
|
Permanent link to this record |
|
|
|
|
Author |
Hans Stadthagen-Gonzalez; M. Carmen Parafita; C. Alejandro Parraga; Markus F. Damian |
|
|
Title |
Testing alternative theoretical accounts of code-switching: Insights from comparative judgments of adjective noun order |
Type |
Journal Article |
|
Year |
2019 |
Publication |
International journal of bilingualism: interdisciplinary studies of multilingual behaviour |
Abbreviated Journal |
IJB |
|
|
Volume |
23 |
Issue |
1 |
Pages |
200-220 |
|
|
Keywords |
|
|
|
Abstract |
Objectives:
Spanish and English contrast in adjective–noun word order: for example, brown dress (English) vs. vestido marrón (‘dress brown’, Spanish). According to the Matrix Language model (MLF) word order in code-switched sentences must be compatible with the word order of the matrix language, but working within the minimalist program (MP), Cantone and MacSwan arrived at the descriptive generalization that the position of the noun phrase relative to the adjective is determined by the adjective’s language. Our aim is to evaluate the predictions derived from these two models regarding adjective–noun order in Spanish–English code-switched sentences.
Methodology:
We contrasted the predictions from both models regarding the acceptability of code-switched sentences with different adjective–noun orders that were compatible with the MP, the MLF, both, or none. Acceptability was assessed in Experiment 1 with a 5-point Likert and in Experiment 2 with a 2-Alternative Forced Choice (2AFC) task. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
NEUROBIT; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ SPP2019 |
Serial |
3242 |
|
Permanent link to this record |
|
|
|
|
Author |
Jaume Amores; N. Sebe; Petia Radeva |
|
|
Title |
Boosting the distance estimation: Application to the K-Nearest Neighbor Classifier |
Type |
Journal Article |
|
Year |
2006 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
27 |
Issue |
3 |
Pages |
201–209 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS;MILAB |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ ASR2006 |
Serial |
643 |
|
Permanent link to this record |
|
|
|
|
Author |
Miguel Oliveira; Angel Sappa; V.Santos |
|
|
Title |
Unsupervised Local Color Correction for Coarsely Registered Images |
Type |
Conference Article |
|
Year |
2011 |
Publication |
IEEE conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
201-208 |
|
|
Keywords |
|
|
|
Abstract |
The current paper proposes a new parametric local color correction technique. Initially, several color transfer functions are computed from the output of the mean shift color segmentation algorithm. Secondly, color influence maps are calculated. Finally, the contribution of every color transfer function is merged using the weights from the color influence maps. The proposed approach is compared with both global and local color correction approaches. Results show that our method outperforms the technique ranked first in a recent performance evaluation on this topic. Moreover, the proposed approach is computed in about one tenth of the time. |
|
|
Address |
Colorado Springs |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1063-6919 |
ISBN |
978-1-4577-0394-2 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPR |
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
Admin @ si @ OSS2011; ADAS @ adas @ |
Serial |
1766 |
|
Permanent link to this record |
|
|
|
|
Author |
Gemma Sanchez; Josep Llados; K. Tombre |
|
|
Title |
A mean string algorithm to compute the average among a set of 2D shapes |
Type |
Journal Article |
|
Year |
2002 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
23 |
Issue |
1-3 |
Pages |
203–214 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; IF: 0.409 |
Approved |
no |
|
|
Call Number |
DAG @ dag @ SLT2002 |
Serial |
275 |
|
Permanent link to this record |
|
|
|
|
Author |
Pau Riba; Josep Llados; Alicia Fornes; Anjan Dutta |
|
|
Title |
Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases |
Type |
Journal Article |
|
Year |
2017 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
87 |
Issue |
|
Pages |
203-211 |
|
|
Keywords |
|
|
|
Abstract |
Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.097; 602.006; 603.053; 600.121 |
Approved |
no |
|
|
Call Number |
RLF2017b |
Serial |
2873 |
|
Permanent link to this record |
|
|
|
|
Author |
Estefania Talavera; Alexandre Cola; Nicolai Petkov; Petia Radeva |
|
|
Title |
Towards Egocentric Person Re-identification and Social Pattern Analysis. |
Type |
Book Chapter |
|
Year |
2019 |
Publication |
Frontiers in Artificial Intelligence and Applications |
Abbreviated Journal |
|
|
|
Volume |
310 |
Issue |
|
Pages |
203 - 211 |
|
|
Keywords |
|
|
|
Abstract |
CoRR abs/1905.04073
Wearable cameras capture a first-person view of the daily activities of the camera wearer, offering a visual diary of the user behaviour. Detection of the appearance of people the camera user interacts with for social interactions analysis is of high interest. Generally speaking, social events, lifestyle and health are highly correlated, but there is a lack of tools to monitor and analyse them. We consider that egocentric vision provides a tool to obtain information and understand users social interactions. We propose a model that enables us to evaluate and visualize social traits obtained by analysing social interactions appearance within egocentric photostreams. Given sets of egocentric images, we detect the appearance of faces within the days of the camera wearer, and rely on clustering algorithms to group their feature descriptors in order to re-identify persons. Recurrence of detected faces within photostreams allows us to shape an idea of the social pattern of behaviour of the user. We validated our model over several weeks recorded by different camera wearers. Our findings indicate that social profiles are potentially useful for social behaviour interpretation. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ TCP2019 |
Serial |
3377 |
|
Permanent link to this record |
|
|
|
|
Author |
Agnes Borras; Josep Llados |
|
|
Title |
Corest: A measure of color and space stability to detect salient regions according to human criteria |
Type |
Conference Article |
|
Year |
2009 |
Publication |
5th International Conference on Computer Vision Theory and Applications |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
204-209 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Lisboa, Portugal |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-989-8111-69-2 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
VISAPP |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ BoL2009 |
Serial |
1225 |
|
Permanent link to this record |
|
|
|
|
Author |
David Aldavert; Ricardo Toledo; Arnau Ramisa; Ramon Lopez de Mantaras |
|
|
Title |
Visual Registration Method For A Low Cost Robot: Computer Vision Systems |
Type |
Conference Article |
|
Year |
2009 |
Publication |
7th International Conference on Computer Vision Systems |
Abbreviated Journal |
|
|
|
Volume |
5815 |
Issue |
|
Pages |
204–214 |
|
|
Keywords |
|
|
|
Abstract |
An autonomous mobile robot must face the correspondence or data association problem in order to carry out tasks like place recognition or unknown environment mapping. In order to put into correspondence two maps, most methods estimate the transformation relating the maps from matches established between low level feature extracted from sensor data. However, finding explicit matches between features is a challenging and computationally expensive task. In this paper, we propose a new method to align obstacle maps without searching explicit matches between features. The maps are obtained from a stereo pair. Then, we use a vocabulary tree approach to identify putative corresponding maps followed by the Newton minimization algorithm to find the transformation that relates both maps. The proposed method is evaluated in a typical office environment showing good performance. |
|
|
Address |
Belgica |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-04666-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICVS |
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
Admin @ si @ ATR2009b |
Serial |
1247 |
|
Permanent link to this record |
|
|
|
|
Author |
Wenjuan Gong; Jordi Gonzalez; Joao Manuel R. S. Taveres; Xavier Roca |
|
|
Title |
A New Image Dataset on Human Interactions |
Type |
Conference Article |
|
Year |
2012 |
Publication |
7th Conference on Articulated Motion and Deformable Objects |
Abbreviated Journal |
|
|
|
Volume |
7378 |
Issue |
|
Pages |
204-209 |
|
|
Keywords |
|
|
|
Abstract |
This article describes a new collection of still image dataset which are dedicated to interactions between people. Human action recognition from still images have been a hot topic recently, but most of them are actions performed by a single person, like running, walking, riding bikes, phoning and so on and there is no interactions between people in one image. The dataset collected in this paper are concentrating on human interaction between two people aiming to explore this new topic in the research area of action recognition from still images. |
|
|
Address |
Mallorca |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-31566-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AMDO |
|
|
Notes |
ISE |
Approved |
no |
|
|
Call Number |
Admin @ si @ GGT2012 |
Serial |
2030 |
|
Permanent link to this record |
|
|
|
|
Author |
Angel Sappa; David Geronimo; Fadi Dornaika; Antonio Lopez |
|
|
Title |
Real Time Vehicle Pose Using On-Board Stereo Vision System |
Type |
Conference Article |
|
Year |
2006 |
Publication |
International Conference on Image Analysis and Recognition |
Abbreviated Journal |
ICIAR |
|
|
Volume |
|
Issue |
LNCS 4142 |
Pages |
205–216 |
|
|
Keywords |
|
|
|
Abstract |
This paper presents a robust technique for a real time estimation of both camera’s position and orientation—referred as pose. A commercial stereo vision system is used. Unlike previous approaches, it can be used either for urban or highway scenarios. The proposed technique consists of two stages. Initially, a compact 2D representation of the original 3D data points is computed. Then, a RANSAC based least squares approach is used for fitting a plane to the road. At the same time,
relative camera’s position and orientation are computed. The proposed technique is intended to be used on a driving assistance scheme for applications such as obstacle or pedestrian detection. Experimental results on urban environments with different road geometries are presented. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ SGD2006b |
Serial |
671 |
|
Permanent link to this record |
|
|
|
|
Author |
Debora Gil; Petia Radeva |
|
|
Title |
Shape Restoration via a Regularized Curvature Flow |
Type |
Journal Article |
|
Year |
2004 |
Publication |
Journal of Mathematical Imaging and Vision |
Abbreviated Journal |
|
|
|
Volume |
21 |
Issue |
3 |
Pages |
205-223 |
|
|
Keywords |
|
|
|
Abstract |
Any image filtering operator designed for automatic shape restoration should satisfy robustness (whatever the nature and degree of noise is) as well as non-trivial smooth asymptotic behavior. Moreover, a stopping criterion should be determined by characteristics of the evolved image rather than dependent on the number of iterations. Among the several PDE based techniques, curvature flows appear to be highly reliable for strongly noisy images compared to image diffusion processes.
In the present paper, we introduce a regularized curvature flow (RCF) that admits non-trivial steady states. It is based on a measure of the local curve smoothness that takes into account regularity of the curve curvature and serves as stopping term in the mean curvature flow. We prove that this measure decreases over the orbits of RCF, which endows the method with a natural stop criterion in terms of the magnitude of this measure. Further, in its discrete version it produces steady states consisting of piece-wise regular curves. Numerical experiments made on synthetic shapes corrupted with different kinds of noise show the abilities and limitations of each of the current geometric flows and the benefits of RCF. Finally, we present results on real images that illustrate the usefulness of the present approach in practical applications. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM;MILAB |
Approved |
no |
|
|
Call Number |
IAM @ iam @ GiR2004c |
Serial |
1532 |
|
Permanent link to this record |
|
|
|
|
Author |
Fahad Shahbaz Khan; Muhammad Anwer Rao; Joost Van de Weijer; Andrew Bagdanov; Antonio Lopez; Michael Felsberg |
|
|
Title |
Coloring Action Recognition in Still Images |
Type |
Journal Article |
|
Year |
2013 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
|
|
Volume |
105 |
Issue |
3 |
Pages |
205-221 |
|
|
Keywords |
|
|
|
Abstract |
In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bag-of-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images. We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color–shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer US |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0920-5691 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
CIC; ADAS; 600.057; 600.048 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KRW2013 |
Serial |
2285 |
|
Permanent link to this record |
|
|
|
|
Author |
Marc Bolaños; Alvaro Peris; Francisco Casacuberta; Sergi Solera; Petia Radeva |
|
|
Title |
Egocentric video description based on temporally-linked sequences |
Type |
Journal Article |
|
Year |
2018 |
Publication |
Journal of Visual Communication and Image Representation |
Abbreviated Journal |
JVCIR |
|
|
Volume |
50 |
Issue |
|
Pages |
205-216 |
|
|
Keywords |
egocentric vision; video description; deep learning; multi-modal learning |
|
|
Abstract |
Egocentric vision consists in acquiring images along the day from a first person point-of-view using wearable cameras. The automatic analysis of this information allows to discover daily patterns for improving the quality of life of the user. A natural topic that arises in egocentric vision is storytelling, that is, how to understand and tell the story relying behind the pictures.
In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also release the EDUB-SegDesc dataset. This is the first dataset for egocentric image sequences description, consisting of 1,339 events with 3,991 descriptions, from 55 days acquired by 11 people. Finally, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ BPC2018 |
Serial |
3109 |
|
Permanent link to this record |