Records |
Author |
Bogdan Raducanu; Jordi Vitria |
Title |
Incremental Subspace Learning for Cognitive Visual Processes |
Type |
Conference Article |
Year |
2007 |
Publication |
Advances in Brain, Vision and Artificial Intelligence, 2nd International Symposium |
Abbreviated Journal |
|
Volume |
4729 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
214–223 |
Keywords |
|
Abstract |
|
Address |
Naples (Italy) |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
BVAI’07 |
Notes |
OR;MV |
Approved |
no |
Call Number |
BCNPCL @ bcnpcl @ RaV2007b |
Serial |
901 |
Permanent link to this record |
|
|
|
Author |
Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras |
Title |
Unveiling the Influence of Image Super-Resolution on Aerial Scene Classification |
Type |
Conference Article |
Year |
2023 |
Publication |
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications |
Abbreviated Journal |
|
Volume |
14469 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
214–228 |
Keywords |
|
Abstract |
Deep learning has made significant advances in recent years, and as a result, it is now in a stage where it can achieve outstanding results in tasks requiring visual understanding of scenes. However, its performance tends to decline when dealing with low-quality images. The advent of super-resolution (SR) techniques has started to have an impact on the field of remote sensing by enabling the restoration of fine details and enhancing image quality, which could help to increase performance in other vision tasks. However, in previous works, contradictory results for scene visual understanding were achieved when SR techniques were applied. In this paper, we present an experimental study on the impact of SR on enhancing aerial scene classification. Through the analysis of different state-of-the-art SR algorithms, including traditional methods and deep learning-based approaches, we unveil the transformative potential of SR in overcoming the limitations of low-resolution (LR) aerial imagery. By enhancing spatial resolution, more fine details are captured, opening the door for an improvement in scene understanding. We also discuss the effect of different image scales on the quality of SR and its effect on aerial scene classification. Our experimental work demonstrates the significant impact of SR on enhancing aerial scene classification compared to LR images, opening new avenues for improved remote sensing applications. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CIARP |
Notes |
MSIAU |
Approved |
no |
Call Number |
Admin @ si @ IBP2023 |
Serial |
4008 |
Permanent link to this record |
|
|
|
Author |
Marçal Rusiñol; Josep Llados |
Title |
Efficient Logo Retrieval Through Hashing Shape Context Descriptors |
Type |
Conference Article |
Year |
2010 |
Publication |
9th IAPR International Workshop on Document Analysis Systems |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
215–222 |
Keywords |
|
Abstract |
In this paper, we present an approach towards the retrieval of words from graphical document images. In graphical documents, due to presence of multi-oriented characters in non-structured layout, word indexing is a challenging task. The proposed approach uses recognition results of individual components to form character pairs with the neighboring components. An indexing scheme is designed to store the spatial description of components and to access them efficiently. Given a query text word (ascii/unicode format), the character pairs present in it are searched in the document. Next the retrieved character pairs are linked sequentially to form character string. Dynamic programming is applied to find different instances of query words. A string edit distance is used here to match the query word as the objective function. Recognition of multi-scale and multi-oriented character component is done using Support Vector Machine classifier. To consider multi-oriented character strings the features used in the SVM are invariant to character orientation. Experimental results show that the method is efficient to locate a query word from multi-oriented text in graphical documents. |
Address |
Boston; USA |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
DAS |
Notes |
DAG |
Approved |
no |
Call Number |
DAG @ dag @ RuL2010b |
Serial |
1434 |
Permanent link to this record |
|
|
|
Author |
Oscar Amoros; Sergio Escalera; Anna Puig |
Title |
Adaboost GPU-based Classifier for Direct Volume Rendering |
Type |
Conference Article |
Year |
2011 |
Publication |
International Conference on Computer Graphics Theory and Applications |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
215-219 |
Keywords |
|
Abstract |
In volume visualization, the voxel visibitity and materials are carried out through an interactive editing of Transfer Function. In this paper, we present a two-level GPU-based labeling method that computes in times of rendering a set of labeled structures using the Adaboost machine learning classifier. In a pre-processing step, Adaboost trains a binary classifier from a pre-labeled dataset and, in each sample, takes into account a set of features. This binary classifier is a weighted combination of weak classifiers, which can be expressed as simple decision functions estimated on a single feature values. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. We propose an alternative representation of these classifiers that allow a GPU-based parallelizated testing stage embedded into the visualization pipeline. The empirical results confirm the OpenCL-based classification of biomedical datasets as a tough problem where an opportunity for further research emerges. |
Address |
Algarve, Portugal |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
GRAPP |
Notes |
MILAB; HuPBA |
Approved |
no |
Call Number |
Admin @ si @ AEP2011 |
Serial |
1774 |
Permanent link to this record |
|
|
|
Author |
David Roche; Debora Gil; Jesus Giraldo |
Title |
An inference model for analyzing termination conditions of Evolutionary Algorithms |
Type |
Conference Article |
Year |
2011 |
Publication |
14th Congrès Català en Intel·ligencia Artificial |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
216-225 |
Keywords |
Evolutionary Computation Convergence, Termination Conditions, Statistical Inference |
Abstract |
In real-world problems, it is mandatory to design a termination condition for Evolutionary Algorithms (EAs) ensuring stabilization close to the unknown optimum. Distribution-based quantities are good candidates as far as suitable parameters are used. A main limitation for application to real-world problems is that such parameters strongly depend on the topology of the objective function, as well as, the EA paradigm used.
We claim that the termination problem would be fully solved if we had a model measuring to what extent a distribution-based quantity asymptotically behaves like the solution accuracy. We present a regression-prediction model that relates any two given quantities and reports if they can be statistically swapped as termination conditions. Our framework is applied to two issues. First, exploring if the parameters involved in the computation of distribution-based quantities influence their asymptotic behavior. Second, to what extent existing distribution-based quantities can be asymptotically exchanged for the accuracy of the EA solution. |
Address |
Lleida, Catalonia (Spain) |
Corporate Author |
Associació Catalana Intel·ligència Artificial |
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
978-1-60750-841-0 |
Medium |
|
Area |
|
Expedition |
|
Conference |
CCIA |
Notes |
IAM |
Approved |
no |
Call Number |
IAM @ iam @ RGG2011a |
Serial |
1677 |
Permanent link to this record |
|
|
|
Author |
Jaume Gibert; Ernest Valveny; Horst Bunke |
Title |
Vocabulary Selection for Graph of Words Embedding |
Type |
Conference Article |
Year |
2011 |
Publication |
5th Iberian Conference on Pattern Recognition and Image Analysis |
Abbreviated Journal |
|
Volume |
6669 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
216-223 |
Keywords |
|
Abstract |
The Graph of Words Embedding consists in mapping every graph in a given dataset to a feature vector by counting unary and binary relations between node attributes of the graph. It has been shown to perform well for graphs with discrete label alphabets. In this paper we extend the methodology to graphs with n-dimensional continuous attributes by selecting node representatives. We propose three different discretization procedures for the attribute space and experimentally evaluate the dependence on both the selector and the number of node representatives. In the context of graph classification, the experimental results reveal that on two out of three public databases the proposed extension achieves superior performance over a standard reference system. |
Address |
Las Palmas de Gran Canaria. Spain |
Corporate Author |
|
Thesis |
|
Publisher |
Springer |
Place of Publication |
Berlin |
Editor |
Vitria, Jordi; Sanches, João Miguel Raposo; Hernández, Mario |
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
978-3-642-21256-7 |
Medium |
|
Area |
|
Expedition |
|
Conference |
IbPRIA |
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ GVB2011b |
Serial |
1744 |
Permanent link to this record |
|
|
|
Author |
Marcelo D. Pistarelli; Angel Sappa; Ricardo Toledo |
Title |
Multispectral Stereo Image Correspondence |
Type |
Conference Article |
Year |
2013 |
Publication |
15th International Conference on Computer Analysis of Images and Patterns |
Abbreviated Journal |
|
Volume |
8048 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
217-224 |
Keywords |
|
Abstract |
This paper presents a novel multispectral stereo image correspondence approach. It is evaluated using a stereo rig constructed with a visible spectrum camera and a long wave infrared spectrum camera. The novelty of the proposed approach lies on the usage of Hough space as a correspondence search domain. In this way it avoids searching for correspondence in the original multispectral image domains, where information is low correlated, and a common domain is used. The proposed approach is intended to be used in outdoor urban scenarios, where images contain large amount of edges. These edges are used as distinctive characteristics for the matching in the Hough space. Experimental results are provided showing the validity of the proposed approach. |
Address |
York; uk; August 2013 |
Corporate Author |
|
Thesis |
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
0302-9743 |
ISBN |
978-3-642-40245-6 |
Medium |
|
Area |
|
Expedition |
|
Conference |
CAIP |
Notes |
ADAS; 600.055 |
Approved |
no |
Call Number |
Admin @ si @ PST2013 |
Serial |
2561 |
Permanent link to this record |
|
|
|
Author |
Cristina Palmero; Albert Clapes; Chris Bahnsen; Andreas Møgelmose; Thomas B. Moeslund; Sergio Escalera |
Title |
Multi-modal RGB-Depth-Thermal Human Body Segmentation |
Type |
Journal Article |
Year |
2016 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
118 |
Issue |
2 |
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
217-239 |
Keywords |
Human body segmentation; RGB ; Depth Thermal |
Abstract |
This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
Springer US |
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HuPBA;MILAB; |
Approved |
no |
Call Number |
Admin @ si @ PCB2016 |
Serial |
2767 |
Permanent link to this record |
|
|
|
Author |
Alicia Fornes; Asma Bensalah; Cristina Carmona_Duarte; Jialuo Chen; Miguel A. Ferrer; Andreas Fischer; Josep Llados; Cristina Martin; Eloy Opisso; Rejean Plamondon; Anna Scius-Bertrand; Josep Maria Tormos |
Title |
The RPM3D Project: 3D Kinematics for Remote Patient Monitoring |
Type |
Conference Article |
Year |
2022 |
Publication |
Intertwining Graphonomics with Human Movements. 20th International Conference of the International Graphonomics Society, IGS 2022 |
Abbreviated Journal |
|
Volume |
13424 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
217-226 |
Keywords |
Healthcare applications; Kinematic; Theory of Rapid Human Movements; Human activity recognition; Stroke rehabilitation; 3D kinematics |
Abstract |
This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute (https://www.guttmann.com/en/) (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases. |
Address |
June 7-9, 2022, Las Palmas de Gran Canaria, Spain |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
IGS |
Notes |
DAG; 600.121; 600.162; 602.230; 600.140 |
Approved |
no |
Call Number |
Admin @ si @ FBC2022 |
Serial |
3739 |
Permanent link to this record |
|
|
|
Author |
C. Alejandro Parraga; Robert Benavente; Maria Vanrell; Ramon Baldrich |
Title |
Modelling Inter-Colour Regions of Colour Naming Space |
Type |
Conference Article |
Year |
2008 |
Publication |
4th European Conference on Colour in Graphics, Imaging and Vision Proceedings |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
218–222 |
Keywords |
|
Abstract |
|
Address |
Terrassa (Spain) |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CGIV08 |
Notes |
CAT;CIC |
Approved |
no |
Call Number |
CAT @ cat @ PBV2008 |
Serial |
969 |
Permanent link to this record |
|
|
|
Author |
Adam Fodor; Rachid R. Saboundji; Julio C. S. Jacques Junior; Sergio Escalera; David Gallardo Pujol; Andras Lorincz |
Title |
Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures |
Type |
Conference Article |
Year |
2022 |
Publication |
Understanding Social Behavior in Dyadic and Small Group Interactions |
Abbreviated Journal |
|
Volume |
173 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
218-241 |
Keywords |
|
Abstract |
Human-machine, human-robot interaction, and collaboration appear in diverse fields, from homecare to Cyber-Physical Systems. Technological development is fast, whereas real-time methods for social communication analysis that can measure small changes in sentiment and personality states, including visual, acoustic and language modalities are lagging, particularly when the goal is to build robust, appearance invariant, and fair methods. We study and compare methods capable of fusing modalities while satisfying real-time and invariant appearance conditions. We compare state-of-the-art transformer architectures in sentiment estimation and introduce them in the much less explored field of personality perception. We show that the architectures perform differently on automatic sentiment and personality perception, suggesting that each task may be better captured/modeled by a particular method. Our work calls attention to the attractive properties of the linear versions of the transformer architectures. In particular, we show that the best results are achieved by fusing the different architectures{’} preprocessing methods. However, they pose extreme conditions in computation power and energy consumption for real-time computations for quadratic transformers due to their memory requirements. In turn, linear transformers pave the way for quantifying small changes in sentiment estimation and personality perception for real-time social communications for machines and robots. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
PMLR |
Notes |
HuPBA; no menciona |
Approved |
no |
Call Number |
Admin @ si @ FSJ2022 |
Serial |
3769 |
Permanent link to this record |
|
|
|
Author |
Marçal Rusiñol; R.Roset; Josep Llados; C.Montaner |
Title |
Automatic Index Generation of Digitized Map Series by Coordinate Extraction and Interpretation |
Type |
Journal |
Year |
2011 |
Publication |
e-Perimetron |
Abbreviated Journal |
ePER |
Volume |
6 |
Issue |
4 |
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
219-229 |
Keywords |
|
Abstract |
By means of computer vision algorithms scanned images of maps are processed in order to extract relevant geographic information from printed coordinate pairs. The meaningful information is then transformed into georeferencing information for each single map sheet, and the complete set is compiled to produce a graphical index sheet for the map series along with relevant metadata. The whole process is fully automated and trained to attain maximum effectivity and throughput. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ RRL2011a |
Serial |
1765 |
Permanent link to this record |
|
|
|
Author |
Oriol Ramos Terrades; Alejandro Hector Toselli; Nicolas Serrano; Veronica Romero; Enrique Vidal; Alfons Juan |
Title |
Interactive layout analysis and transcription systems for historic handwritten documents |
Type |
Conference Article |
Year |
2010 |
Publication |
10th ACM Symposium on Document Engineering |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
219–222 |
Keywords |
Handwriting recognition; Interactive predictive processing; Partial supervision; Interactive layout analysis |
Abstract |
The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents, waiting to be classified and finally transcribed into a textual electronic format (such as ASCII or PDF). Nevertheless, most of the available fully-automatic applications addressing this task are far from being perfect and heavy and inefficient human intervention is often required to check and correct the results of such systems. In contrast, multimodal interactive-predictive approaches may allow the users to participate in the process helping the system to improve the overall performance. With this in mind, two sets of recent advances are introduced in this work: a novel interactive method for text block detection and two multimodal interactive handwritten text transcription systems which use active learning and interactive-predictive technologies in the recognition process. |
Address |
Manchester, United Kingdom |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ACM |
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @RTS2010 |
Serial |
1857 |
Permanent link to this record |
|
|
|
Author |
Katerine Diaz; Francesc J. Ferri; Aura Hernandez-Sabate |
Title |
An overview of incremental feature extraction methods based on linear subspaces |
Type |
Journal Article |
Year |
2018 |
Publication |
Knowledge-Based Systems |
Abbreviated Journal |
KBS |
Volume |
145 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
219-235 |
Keywords |
|
Abstract |
With the massive explosion of machine learning in our day-to-day life, incremental and adaptive learning has become a major topic, crucial to keep up-to-date and improve classification models and their corresponding feature extraction processes. This paper presents a categorized overview of incremental feature extraction based on linear subspace methods which aim at incorporating new information to the already acquired knowledge without accessing previous data. Specifically, this paper focuses on those linear dimensionality reduction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives. Thus, we cover the approaches derived from Principal Components Analysis, Linear Discriminative Analysis and Discriminative Common Vector methods. For each basic method, its incremental approaches are differentiated according to the subspace model and matrix decomposition involved in the updating process. Besides this categorization, several updating strategies are distinguished according to the amount of data used to update and to the fact of considering a static or dynamic number of classes. Moreover, the specific role of the size/dimension ratio in each method is considered. Finally, computational complexity, experimental setup and the accuracy rates according to published results are compiled and analyzed, and an empirical evaluation is done to compare the best approach of each kind. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
0950-7051 |
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ DFH2018 |
Serial |
3090 |
Permanent link to this record |
|
|
|
Author |
Manuel Carbonell; Alicia Fornes; Mauricio Villegas; Josep Llados |
Title |
A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages |
Type |
Journal Article |
Year |
2020 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
Volume |
136 |
Issue |
|
Pages ![sorted by First Page field, ascending order (up)](img/sort_asc.gif) |
219-227 |
Keywords |
|
Abstract |
In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.140; 601.311; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ CFV2020 |
Serial |
3451 |
Permanent link to this record |