|
Records |
Links |
|
Author |
Aura Hernandez-Sabate; Lluis Albarracin; Daniel Calvo; Nuria Gorgorio |
|
|
Title |
EyeMath: Identifying Mathematics Problem Solving Processes in a RTS Video Game |
Type |
Conference Article |
|
Year |
2016 |
Publication |
5th International Conference Games and Learning Alliance |
Abbreviated Journal |
|
|
|
Volume |
10056 |
Issue |
|
Pages |
50-59 |
|
|
Keywords |
Simulation environment; Automated Driving; Driver-Vehicle interaction |
|
|
Abstract |
Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
GALA |
|
|
Notes |
ADAS;IAM; |
Approved |
no |
|
|
Call Number |
HAC2016 |
Serial |
2864 |
|
Permanent link to this record |
|
|
|
|
Author |
Azadeh S. Mozafari; David Vazquez; Mansour Jamzad; Antonio Lopez |
|
|
Title |
Node-Adapt, Path-Adapt and Tree-Adapt:Model-Transfer Domain Adaptation for Random Forest |
Type |
Miscellaneous |
|
Year |
2016 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Domain Adaptation; Pedestrian detection; Random Forest |
|
|
Abstract |
Random Forest (RF) is a successful paradigm for learning classifiers due to its ability to learn from large feature spaces and seamlessly integrate multi-class classification, as well as the achieved accuracy and processing efficiency. However, as many other classifiers, RF requires domain adaptation (DA) provided that there is a mismatch between the training (source) and testing (target) domains which provokes classification degradation. Consequently, different RF-DA methods have been proposed, which not only require target-domain samples but revisiting the source-domain ones, too. As novelty, we propose three inherently different methods (Node-Adapt, Path-Adapt and Tree-Adapt) that only require the learned source-domain RF and a relatively few target-domain samples for DA, i.e. source-domain samples do not need to be available. To assess the performance of our proposals we focus on image-based object detection, using the pedestrian detection problem as challenging proof-of-concept. Moreover, we use the RF with expert nodes because it is a competitive patch-based pedestrian model. We test our Node-, Path- and Tree-Adapt methods in standard benchmarks, showing that DA is largely achieved. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ MVJ2016 |
Serial |
2868 |
|
Permanent link to this record |
|
|
|
|
Author |
H. Martin Kjer; Jens Fagertun; Sergio Vera; Debora Gil; Miguel Angel Gonzalez Ballester; Rasmus R. Paulsena |
|
|
Title |
Free-form image registration of human cochlear uCT data using skeleton similarity as anatomical prior |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Patter Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
76 |
Issue |
1 |
Pages |
76-82 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; 600.060 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MFV2017b |
Serial |
2941 |
|
Permanent link to this record |
|
|
|
|
Author |
Marc Sunset Perez; Marc Comino Trinidad; Dimosthenis Karatzas; Antonio Chica Calaf; Pere Pau Vazquez Alcocer |
|
|
Title |
Development of general‐purpose projection‐based augmented reality systems |
Type |
Journal |
|
Year |
2016 |
Publication |
IADIs international journal on computer science and information systems |
Abbreviated Journal |
IADIs |
|
|
Volume |
11 |
Issue |
2 |
Pages |
1-18 |
|
|
Keywords |
|
|
|
Abstract |
Despite the large amount of methods and applications of augmented reality, there is little homogenizatio n on the software platforms that support them. An exception may be the low level control software that is provided by some high profile vendors such as Qualcomm and Metaio. However, these provide fine grain modules for e.g. element tracking. We are more co ncerned on the application framework, that includes the control of the devices working together for the development of the AR experience. In this paper we describe the development of a software framework for AR setups. We concentrate on the modular design of the framework, but also on some hard problems such as the calibration stage, crucial for projection – based AR. The developed framework is suitable and has been tested in AR applications using camera – projector pairs, for both fixed and nomadic setups |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.084 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SCK2016 |
Serial |
2890 |
|
Permanent link to this record |
|
|
|
|
Author |
Lluis Gomez |
|
|
Title |
Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding |
Type |
Book Whole |
|
Year |
2016 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes.
For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any
language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision.
Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which
text emerges as a perceptually significant group of atomic objects.
In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods.
Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve
the overall tracking performance of the system.
Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key
characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
|
Place of Publication |
|
Editor |
Dimosthenis Karatzas |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ Gom2016 |
Serial |
2891 |
|
Permanent link to this record |
|
|
|
|
Author |
Alicia Fornes; Josep Llados; Oriol Ramos Terrades; Marçal Rusiñol |
|
|
Title |
La Visió per Computador com a Eina per a la Interpretació Automàtica de Fonts Documentals |
Type |
Journal |
|
Year |
2016 |
Publication |
Lligall, Revista Catalana d'Arxivística |
Abbreviated Journal |
|
|
|
Volume |
39 |
Issue |
|
Pages |
20-46 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.097 |
Approved |
no |
|
|
Call Number |
Admin @ si @ FLR2016 |
Serial |
2897 |
|
Permanent link to this record |
|
|
|
|
Author |
Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Anna Cabre |
|
|
Title |
Bridging the gap between historical demography and computing: tools for computer-assisted transcription and the analysis of demographic sources |
Type |
Book Chapter |
|
Year |
2016 |
Publication |
The future of historical demography. Upside down and inside out |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
127-131 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Acco Publishers |
Place of Publication |
|
Editor |
K.Matthijs; S.Hin; H.Matsuo; J.Kok |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-94-6292-722-3 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.097 |
Approved |
no |
|
|
Call Number |
Admin @ si @ PFL2016 |
Serial |
2907 |
|
Permanent link to this record |
|
|
|
|
Author |
Miguel Oliveira; Victor Santos; Angel Sappa; P. Dias; A. Moreira |
|
|
Title |
Incremental texture mapping for autonomous driving |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Robotics and Autonomous Systems |
Abbreviated Journal |
RAS |
|
|
Volume |
84 |
Issue |
|
Pages |
113-128 |
|
|
Keywords |
Scene reconstruction; Autonomous driving; Texture mapping |
|
|
Abstract |
Autonomous vehicles have a large number of on-board sensors, not only for providing coverage all around the vehicle, but also to ensure multi-modality in the observation of the scene. Because of this, it is not trivial to come up with a single, unique representation that feeds from the data given by all these sensors. We propose an algorithm which is capable of mapping texture collected from vision based sensors onto a geometric description of the scenario constructed from data provided by 3D sensors. The algorithm uses a constrained Delaunay triangulation to produce a mesh which is updated using a specially devised sequence of operations. These enforce a partial configuration of the mesh that avoids bad quality textures and ensures that there are no gaps in the texture. Results show that this algorithm is capable of producing fine quality textures. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS; 600.086 |
Approved |
no |
|
|
Call Number |
Admin @ si @ OSS2016b |
Serial |
2912 |
|
Permanent link to this record |
|
|
|
|
Author |
Wenjuan Gong; Xuena Zhang; Jordi Gonzalez; Andrews Sobral; Thierry Bouwmans; Changhe Tu; El-hadi Zahzah |
|
|
Title |
Human Pose Estimation from Monocular Images: A Comprehensive Survey |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Sensors |
Abbreviated Journal |
SENS |
|
|
Volume |
16 |
Issue |
12 |
Pages |
1966 |
|
|
Keywords |
human pose estimation; human bodymodels; generativemethods; discriminativemethods; top-down methods; bottom-up methods |
|
|
Abstract |
Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problem into several modules: feature extraction and description, human body models, and modeling
methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE; 600.098; 600.119 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GZG2016 |
Serial |
2933 |
|
Permanent link to this record |
|
|
|
|
Author |
Anastasios Doulamis; Nikolaos Doulamis; Marco Bertini; Jordi Gonzalez; Thomas B. Moeslund |
|
|
Title |
Introduction to the Special Issue on the Analysis and Retrieval of Events/Actions and Workflows in Video Streams |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
|
|
Volume |
75 |
Issue |
22 |
Pages |
14985-14990 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE; HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ DDB2016 |
Serial |
2934 |
|
Permanent link to this record |
|
|
|
|
Author |
Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades |
|
|
Title |
Sparse representation over learned dictionary for symbol recognition |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Signal Processing |
Abbreviated Journal |
SP |
|
|
Volume |
125 |
Issue |
|
Pages |
36-47 |
|
|
Keywords |
Symbol Recognition; Sparse Representation; Learned Dictionary; Shape Context; Interest Points |
|
|
Abstract |
In this paper we propose an original sparse vector model for symbol retrieval task. More specically, we apply the K-SVD algorithm for learning a visual dictionary based on symbol descriptors locally computed around interest points. Results on benchmark datasets show that the obtained sparse representation is competitive related to state-of-the-art methods. Moreover, our sparse representation is invariant to rotation and scale transforms and also robust to degraded images and distorted symbols. Thereby, the learned visual dictionary is able to represent instances of unseen classes of symbols. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.061; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ DTR2016 |
Serial |
2946 |
|
Permanent link to this record |
|
|
|
|
Author |
Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades |
|
|
Title |
Spotting Symbol over Graphical Documents Via Sparsity in Visual Vocabulary |
Type |
Book Chapter |
|
Year |
2016 |
Publication |
Recent Trends in Image Processing and Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
709 |
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
RTIP2R |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ HTR2016 |
Serial |
2956 |
|
Permanent link to this record |
|
|
|
|
Author |
Marta Diez-Ferrer; Debora Gil; Elena Carreño; Susana Padrones; Samantha Aso; Vanesa Vicens; Cubero Noelia; Rosa Lopez Lisbona; Carles Sanchez; Agnes Borras; Antoni Rosell |
|
|
Title |
Positive Airway Pressure-Enhanced CT to Improve Virtual Bronchoscopic Navigation |
Type |
Journal Article |
|
Year |
2016 |
Publication |
Chest Journal |
Abbreviated Journal |
CHEST |
|
|
Volume |
150 |
Issue |
4 |
Pages |
1003A |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; 600.096; 600.075 |
Approved |
no |
|
|
Call Number |
Admin @ si @ DGC2016 |
Serial |
3099 |
|
Permanent link to this record |
|
|
|
|
Author |
Maria Oliver; Gloria Haro; Mariella Dimiccoli; Baptiste Mazin; Coloma Ballester |
|
|
Title |
A computational model of amodal completion |
Type |
Conference Article |
|
Year |
2016 |
Publication |
SIAM Conference on Imaging Science |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This paper presents a computational model to recover the most likely interpretation of the 3D scene structure from a planar image, where some objects may occlude others. The estimated scene interpretation is obtained by integrating some global and local cues and provides both the complete disoccluded objects that form the scene and their ordering according to depth. Our method first computes several distal scenes which are compatible with the proximal planar image. To compute these different hypothesized scenes, we propose a perceptually inspired object disocclusion method, which works by minimizing the Euler's elastica as well as by incorporating the relatability of partially occluded contours and the convexity of the disoccluded objects. Then, to estimate the preferred scene we rely on a Bayesian model and define probabilities taking into account the global complexity of the objects in the hypothesized scenes as well as the effort of bringing these objects in their relative position in the planar image, which is also measured by an Euler's elastica-based quantity. The model is illustrated with numerical experiments on, both, synthetic and real images showing the ability of our model to reconstruct the occluded objects and the preferred perceptual order among them. We also present results on images of the Berkeley dataset with provided figure-ground ground-truth labeling. |
|
|
Address |
Albuquerque; New Mexico; USA; May 2016 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
IS |
|
|
Notes |
MILAB; 601.235 |
Approved |
no |
|
|
Call Number |
Admin @ si @OHD2016a |
Serial |
2788 |
|
Permanent link to this record |
|
|
|
|
Author |
Y. Patel; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas |
|
|
Title |
Dynamic Lexicon Generation for Natural Scene Images |
Type |
Conference Article |
|
Year |
2016 |
Publication |
14th European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
395-410 |
|
|
Keywords |
scene text; photo OCR; scene understanding; lexicon generation; topic modeling; CNN |
|
|
Abstract |
Many scene text understanding methods approach the endtoend recognition problem from a word-spotting perspective and take huge benet from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely discussed.
In this paper we propose a method that generates contextualized lexicons
for scene images using only visual information. For this, we exploit
the correlation between visual and textual information in a dataset consisting
of images and textual content associated with them. Using the topic modeling framework to discover a set of latent topics in such a dataset allows us to re-rank a xed dictionary in a way that prioritizes the words that are more likely to appear in a given image. Moreover, we train a CNN that is able to reproduce those word rankings but using only the image raw pixels as input. We demonstrate that the quality of the automatically obtained custom lexicons is superior to a generic frequency-based baseline. |
|
|
Address |
Amsterdam; The Netherlands; October 2016 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCVW |
|
|
Notes |
DAG; 600.084 |
Approved |
no |
|
|
Call Number |
Admin @ si @ PGR2016 |
Serial |
2825 |
|
Permanent link to this record |