|
Records |
Links |
|
Author |
Marçal Rusiñol; J. Chazalon; Jean-Marc Ogier |
|
|
Title |
Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images |
Type |
Conference Article |
|
Year |
2014 |
Publication |
11th IAPR International Workshop on Document Analysis and Systems |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
181 - 185 |
|
|
Keywords |
|
|
|
Abstract |
Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given. |
|
|
Address |
Tours; France; April 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4799-3243-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG; 601.223; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RCO2014a |
Serial |
2545 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol; J. Chazalon; Jean-Marc Ogier |
|
|
Title |
Normalisation et validation d'images de documents capturées en mobilité |
Type |
Conference Article |
|
Year |
2014 |
Publication |
Colloque International Francophone sur l'Écrit et le Document |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
109-124 |
|
|
Keywords |
mobile document image acquisition; perspective correction; illumination correction; quality assessment; focus measure; OCR accuracy prediction |
|
|
Abstract |
Mobile document image acquisition integrates many distortions which must be corrected or detected on the device, before the document becomes unavailable or paying data transmission fees. In this paper, we propose a system to correct perspective and illumination issues, and estimate the sharpness of the image for OCR recognition. The correction step relies on fast and accurate border detection followed by illumination normalization. Its evaluation on a private dataset shows a clear improvement on OCR accuracy. The quality assessment
step relies on a combination of focus measures. Its evaluation on a public dataset shows that this simple method compares well to state of the art, learning-based methods which cannot be embedded on a mobile, and outperforms metric-based methods. |
|
|
Address |
Nancy; France; March 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CIFED |
|
|
Notes |
DAG; 601.223; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RCO2014b |
Serial |
2546 |
|
Permanent link to this record |
|
|
|
|
Author |
Eloi Puertas; Miguel Angel Bautista; Daniel Sanchez; Sergio Escalera; Oriol Pujol |
|
|
Title |
Learning to Segment Humans by Stacking their Body Parts, |
Type |
Conference Article |
|
Year |
2014 |
Publication |
ECCV Workshop on ChaLearn Looking at People |
Abbreviated Journal |
|
|
|
Volume |
8925 |
Issue |
|
Pages |
685-697 |
|
|
Keywords |
Human body segmentation; Stacked Sequential Learning |
|
|
Abstract |
Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body
part likelihood maps. These likelihood maps are obtained in a first stage
by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCVW |
|
|
Notes |
HuPBA;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ PBS2014 |
Serial |
2553 |
|
Permanent link to this record |
|
|
|
|
Author |
Marc Bolaños; Maite Garolera; Petia Radeva |
|
|
Title |
Video Segmentation of Life-Logging Videos |
Type |
Conference Article |
|
Year |
2014 |
Publication |
8th Conference on Articulated Motion and Deformable Objects |
Abbreviated Journal |
|
|
|
Volume |
8563 |
Issue |
|
Pages |
1-9 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AMDO |
|
|
Notes |
MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGR2014 |
Serial |
2558 |
|
Permanent link to this record |
|
|
|
|
Author |
Francesco Brughi; Debora Gil; Llorenç Badiella; Eva Jove Casabella; Oriol Ramos Terrades |
|
|
Title |
Exploring the impact of inter-query variability on the performance of retrieval systems |
Type |
Conference Article |
|
Year |
2014 |
Publication |
11th International Conference on Image Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
8814 |
Issue |
|
Pages |
413–420 |
|
|
Keywords |
|
|
|
Abstract |
This paper introduces a framework for evaluating the performance of information retrieval systems. Current evaluation metrics provide an average score that does not consider performance variability across the query set. In this manner, conclusions lack of any statistical significance, yielding poor inference to cases outside the query set and possibly unfair comparisons. We propose to apply statistical methods in order to obtain a more informative measure for problems in which different query classes can be identified. In this context, we assess the performance variability on two levels: overall variability across the whole query set and specific query class-related variability. To this end, we estimate confidence bands for precision-recall curves, and we apply ANOVA in order to assess the significance of the performance across different query classes. |
|
|
Address |
Algarve; Portugal; October 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer International Publishing |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-319-11757-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIAR |
|
|
Notes |
IAM; DAG; 600.060; 600.061; 600.077; 600.075 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGB2014 |
Serial |
2559 |
|
Permanent link to this record |
|
|
|
|
Author |
Marcelo D. Pistarelli; Angel Sappa; Ricardo Toledo |
|
|
Title |
Multispectral Stereo Image Correspondence |
Type |
Conference Article |
|
Year |
2013 |
Publication |
15th International Conference on Computer Analysis of Images and Patterns |
Abbreviated Journal |
|
|
|
Volume |
8048 |
Issue |
|
Pages |
217-224 |
|
|
Keywords |
|
|
|
Abstract |
This paper presents a novel multispectral stereo image correspondence approach. It is evaluated using a stereo rig constructed with a visible spectrum camera and a long wave infrared spectrum camera. The novelty of the proposed approach lies on the usage of Hough space as a correspondence search domain. In this way it avoids searching for correspondence in the original multispectral image domains, where information is low correlated, and a common domain is used. The proposed approach is intended to be used in outdoor urban scenarios, where images contain large amount of edges. These edges are used as distinctive characteristics for the matching in the Hough space. Experimental results are provided showing the validity of the proposed approach. |
|
|
Address |
York; uk; August 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-40245-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CAIP |
|
|
Notes |
ADAS; 600.055 |
Approved |
no |
|
|
Call Number |
Admin @ si @ PST2013 |
Serial |
2561 |
|
Permanent link to this record |
|
|
|
|
Author |
Gioacchino Vino; Angel Sappa |
|
|
Title |
Revisiting Harris Corner Detector Algorithm: a Gradual Thresholding Approach |
Type |
Conference Article |
|
Year |
2013 |
Publication |
10th International Conference on Image Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
7950 |
Issue |
|
Pages |
354-363 |
|
|
Keywords |
|
|
|
Abstract |
This paper presents an adaptive thresholding approach intended to increase the number of detected corners, while reducing the amount of those ones corresponding to noisy data. The proposed approach works by using the classical Harris corner detector algorithm and overcome the difficulty in finding a general threshold that work well for all the images in a given data set by proposing a novel adaptive thresholding scheme. Initially, two thresholds are used to discern between strong corners and flat regions. Then, a region based criteria is used to discriminate between weak corners and noisy points in the midway interval. Experimental results show that the proposed approach has a better capability to reject false corners and, at the same time, to detect weak ones. Comparisons with the state of the art are provided showing the validity of the proposed approach. |
|
|
Address |
Póvoa de Varzim; Portugal; June 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-39093-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIAR |
|
|
Notes |
ADAS; 600.055 |
Approved |
no |
|
|
Call Number |
Admin @ si @ ViS2013 |
Serial |
2562 |
|
Permanent link to this record |
|
|
|
|
Author |
Alejandro Gonzalez Alzate; Gabriel Villalonga; Jiaolong Xu; David Vazquez; Jaume Amores; Antonio Lopez |
|
|
Title |
Multiview Random Forest of Local Experts Combining RGB and LIDAR data for Pedestrian Detection |
Type |
Conference Article |
|
Year |
2015 |
Publication |
IEEE Intelligent Vehicles Symposium IV2015 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
356-361 |
|
|
Keywords |
Pedestrian Detection |
|
|
Abstract |
Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multimodality and strong multi-view classifier) affect performance both individually and when integrated together. In the multimodality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy. |
|
|
Address |
Seoul; Corea; June 2015 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
ACDC |
Expedition |
|
Conference |
IV |
|
|
Notes |
ADAS; 600.076; 600.057; 600.054 |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ GVX2015 |
Serial |
2625 |
|
Permanent link to this record |
|
|
|
|
Author |
P. Wang; V. Eglin; C. Garcia; C. Largeron; Josep Llados; Alicia Fornes |
|
|
Title |
Représentation par graphe de mots manuscrits dans les images pour la recherche par similarité |
Type |
Conference Article |
|
Year |
2014 |
Publication |
Colloque International Francophone sur l'Écrit et le Document |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
233-248 |
|
|
Keywords |
word spotting; graph-based representation; shape context description; graph edit distance; DTW; block merging; query by example |
|
|
Abstract |
Effective information retrieval on handwritten document images has always been
a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labeled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment results introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods. |
|
|
Address |
Nancy; Francia; March 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CIFED |
|
|
Notes |
DAG; 600.061; 602.006; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ WEG2014c |
Serial |
2564 |
|
Permanent link to this record |
|
|
|
|
Author |
Michal Drozdzal; Jordi Vitria; Santiago Segui; Carolina Malagelada; Fernando Azpiroz; Petia Radeva |
|
|
Title |
Intestinal event segmentation for endoluminal video analysis |
Type |
Conference Article |
|
Year |
2014 |
Publication |
21st IEEE International Conference on Image Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
3592 - 3596 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Paris; Francia; October 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIP |
|
|
Notes |
MILAB; OR;MV |
Approved |
no |
|
|
Call Number |
Admin @ si @ DVS2014 |
Serial |
2565 |
|
Permanent link to this record |
|
|
|
|
Author |
Alicia Fornes; V.C.Kieu; M. Visani; N.Journet; Anjan Dutta |
|
|
Title |
The ICDAR/GREC 2013 Music Scores Competition: Staff Removal |
Type |
Book Chapter |
|
Year |
2014 |
Publication |
Graphics Recognition. Current Trends and Challenges |
Abbreviated Journal |
|
|
|
Volume |
8746 |
Issue |
|
Pages |
207-220 |
|
|
Keywords |
Competition; Graphics recognition; Music scores; Writer identification; Staff removal |
|
|
Abstract |
The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated in both staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario concerning old and degraded music scores. For this purpose, we have generated a new set of semi-synthetic images using two degradation models that we previously introduced: local noise and 3D distortions. In this extended paper we provide an extended description of the dataset, degradation models, evaluation metrics, the participant’s methods and the obtained results that could not be presented at ICDAR and GREC proceedings due to page limitations. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
B.Lamiroy; J.-M. Ogier |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-662-44853-3 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.077; 600.061 |
Approved |
no |
|
|
Call Number |
Admin @ si @ FKV2014 |
Serial |
2581 |
|
Permanent link to this record |
|
|
|
|
Author |
G.Thorvaldsen; Joana Maria Pujadas-Mora; T.Andersen ; L.Eikvil; Josep Llados; Alicia Fornes; Anna Cabre |
|
|
Title |
A Tale of two Transcriptions |
Type |
Journal |
|
Year |
2015 |
Publication |
Historical Life Course Studies |
Abbreviated Journal |
|
|
|
Volume |
2 |
Issue |
|
Pages |
1-19 |
|
|
Keywords |
Nominative Sources; Census; Vital Records; Computer Vision; Optical Character Recognition; Word Spotting |
|
|
Abstract |
non-indexed
This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
2352-6343 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.077; 602.006 |
Approved |
no |
|
|
Call Number |
Admin @ si @ TPA2015 |
Serial |
2582 |
|
Permanent link to this record |
|
|
|
|
Author |
Jiaolong Xu; Sebastian Ramos; David Vazquez; Antonio Lopez |
|
|
Title |
DA-DPM Pedestrian Detection |
Type |
Conference Article |
|
Year |
2013 |
Publication |
ICCV Workshop on Reconstruction meets Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Domain Adaptation; Pedestrian Detection |
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICCVW-RR |
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
Admin @ si @ XRV2013 |
Serial |
2569 |
|
Permanent link to this record |
|
|
|
|
Author |
Gabriel Villalonga; Sebastian Ramos; German Ros; David Vazquez; Antonio Lopez |
|
|
Title |
3d Pedestrian Detection via Random Forest |
Type |
Miscellaneous |
|
Year |
2014 |
Publication |
European Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
231-238 |
|
|
Keywords |
Pedestrian Detection |
|
|
Abstract |
Our demo focuses on showing the extraordinary performance of our novel 3D pedestrian detector along with its simplicity and real-time capabilities. This detector has been designed for autonomous driving applications, but it can also be applied in other scenarios that cover both outdoor and indoor applications.
Our pedestrian detector is based on the combination of a random forest classifier with HOG-LBP features and the inclusion of a preprocessing stage based on 3D scene information in order to precisely determinate the image regions where the detector should search for pedestrians. This approach ends up in a high accurate system that runs real-time as it is required by many computer vision and robotics applications. |
|
|
Address |
Zurich; suiza; September 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCV-Demo |
|
|
Notes |
ADAS; 600.076 |
Approved |
no |
|
|
Call Number |
Admin @ si @ VRR2014 |
Serial |
2570 |
|
Permanent link to this record |
|
|
|
|
Author |
Antonio Clavelli |
|
|
Title |
A computational model of eye guidance, searching for text in real scene images |
Type |
Book Whole |
|
Year |
2014 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Searching for text objects in real scene images is an open problem and a very active computer vision research area. A large number of methods have been proposed tackling the text search as extension of the ones from the document analysis field or inspired by general purpose object detection methods. However the general problem of object search in real scene images remains an extremely challenging problem due to the huge variability in object appearance. This thesis builds on top of the most recent findings in the visual attention literature presenting a novel computational model of eye guidance aiming to better describe text object search in real scene images.
First are presented the relevant state-of-the-art results from the visual attention literature regarding eye movements and visual search. Relevant models of attention are discussed and integrated with recent observations on the role of top-down constraints and the emerging need for a layered model of attention in which saliency is not the only factor guiding attention. Visual attention is then explained by the interaction of several modulating factors, such as objects, value, plans and saliency. Then we introduce our probabilistic formulation of attention deployment in real scene. The model is based on the rationale that oculomotor control depends on two interacting but distinct processes: an attentional process that assigns value to the sources of information and motor process that flexibly links information with action.
In such framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the reward of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects.
In the experimental section the model is tested in laboratory condition, comparing model simulations with data from eye tracking experiments. The comparison is qualitative in terms of observable scan paths and quantitative in terms of statistical similarity of gaze shift amplitude. Experiments are performed using eye tracking data from both a publicly available dataset of face and text and from newly performed eye-tracking experiments on a dataset of street view pictures containing text. The last part of this thesis is dedicated to study the extent to which the proposed model can account for human eye movements in a low constrained setting. We used a mobile eye tracking device and an ad-hoc developed methodology to compare model simulated eye data with the human eye data from mobile eye tracking recordings. Such setting allow to test the model in an incomplete visual information condition, reproducing a close to real-life search task. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Dimosthenis Karatzas;Giuseppe Boccignone;Josep Llados |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-940902-6-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Cla2014 |
Serial |
2571 |
|
Permanent link to this record |