Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	661–675 of 3413 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[31–40] << 41 42 43 44 45 46 47 48 49 50 >> [51–60]

List View

Citations

Details

	Records
	Author	Antonio Clavelli
	Title	A computational model of eye guidance, searching for text in real scene images			Type	Book Whole
	Year	2014	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Searching for text objects in real scene images is an open problem and a very active computer vision research area. A large number of methods have been proposed tackling the text search as extension of the ones from the document analysis field or inspired by general purpose object detection methods. However the general problem of object search in real scene images remains an extremely challenging problem due to the huge variability in object appearance. This thesis builds on top of the most recent findings in the visual attention literature presenting a novel computational model of eye guidance aiming to better describe text object search in real scene images. First are presented the relevant state-of-the-art results from the visual attention literature regarding eye movements and visual search. Relevant models of attention are discussed and integrated with recent observations on the role of top-down constraints and the emerging need for a layered model of attention in which saliency is not the only factor guiding attention. Visual attention is then explained by the interaction of several modulating factors, such as objects, value, plans and saliency. Then we introduce our probabilistic formulation of attention deployment in real scene. The model is based on the rationale that oculomotor control depends on two interacting but distinct processes: an attentional process that assigns value to the sources of information and motor process that flexibly links information with action. In such framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the reward of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. In the experimental section the model is tested in laboratory condition, comparing model simulations with data from eye tracking experiments. The comparison is qualitative in terms of observable scan paths and quantitative in terms of statistical similarity of gaze shift amplitude. Experiments are performed using eye tracking data from both a publicly available dataset of face and text and from newly performed eye-tracking experiments on a dataset of street view pictures containing text. The last part of this thesis is dedicated to study the extent to which the proposed model can account for human eye movements in a low constrained setting. We used a mobile eye tracking device and an ad-hoc developed methodology to compare model simulated eye data with the human eye data from mobile eye tracking recordings. Such setting allow to test the model in an incomplete visual information condition, reproducing a close to real-life search task.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Dimosthenis Karatzas;Giuseppe Boccignone;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-940902-6-4	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ Cla2014			Serial	2571
Permanent link to this record



	Author	Albert Clapes
	Title	Learning to recognize human actions: from hand-crafted to deep-learning based visual representations			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Action recognition is a very challenging and important problem in computer vision. Researchers working on this field aspire to provide computers with the abil ity to visually perceive human actions – that is, to observe, interpret, and under stand human-related events that occur in the physical environment merely from visual data. The applications of this technology are numerous: human-machine interaction, e-health, monitoring/surveillance, and content-based video retrieval, among others. Hand-crafted methods dominated the field until the apparition of the first successful deep learning-based action recognition works. Although ear lier deep-based methods underperformed with respect to hand-crafted approaches, these slowly but steadily improved to become state-of-the-art, eventually achieving better results than hand-crafted ones. Still, hand-crafted approaches can be advan tageous in certain scenarios, specially when not enough data is available to train very large deep models or simply to be combined with deep-based methods to fur ther boost the performance. Hence, showing how hand-crafted features can provide extra knowledge the deep networks are notable to easily learn about human actions. This Thesis concurs in time with this change of paradigm and, hence, reflects it into two distinguished parts. In the first part, we focus on improving current suc cessful hand-crafted approaches for action recognition and we do so from three dif ferent perspectives. Using the dense trajectories framework as a backbone: first, we explore the use of multi-modal and multi-view input data to enrich the trajectory de scriptors. Second, we focus on the classification part of action recognition pipelines and propose an ensemble learning approach, where each classifier leams from a different set of local spatiotemporal features to then combine their outputs following an strategy based on the Dempster-Shaffer Theory. And third, we propose a novel hand-crafted feature extraction method that constructs a rnid-level feature descrip tion to better modellong-term spatiotemporal dynarnics within action videos. Moving to the second part of the Thesis, we start with a comprehensive study of the current deep-learning based action recognition methods. We review both fun damental and cutting edge methodologies reported during the last few years and introduce a taxonomy of deep-leaming methods dedicated to action recognition. In particular, we analyze and discuss how these handle the temporal dimension of data. Last but not least, we propose a residual recurrent network for action recogni tion that naturally integrates all our previous findings in a powerful and prornising framework.
	Address	January 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Sergio Escalera
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-2-8	Medium
	Area		Expedition		Conference
	Notes	HUPBA			Approved	no
	Call Number	Admin @ si @ Cla2019			Serial	3219
Permanent link to this record



	Author	Esteve Cervantes; Long Long Yu; Andrew Bagdanov; Marc Masana; Joost Van de Weijer
	Title	Hierarchical Part Detection with Deep Neural Networks			Type	Conference Article
	Year	2016	Publication	23rd IEEE International Conference on Image Processing	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	Object Recognition; Part Detection; Convolutional Neural Networks
	Abstract	Part detection is an important aspect of object recognition. Most approaches apply object proposals to generate hundreds of possible part bounding box candidates which are then evaluated by part classifiers. Recently several methods have investigated directly regressing to a limited set of bounding boxes from deep neural network representation. However, for object parts such methods may be unfeasible due to their relatively small size with respect to the image. We propose a hierarchical method for object and part detection. In a single network we first detect the object and then regress to part location proposals based only on the feature representation inside the object. Experiments show that our hierarchical approach outperforms a network which directly regresses the part locations. We also show that our approach obtains part detection accuracy comparable or better than state-of-the-art on the CUB-200 bird and Fashionista clothing item datasets with only a fraction of the number of part proposals.
	Address	Phoenix; Arizona; USA; September 2016
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICIP
	Notes	LAMP; 600.106			Approved	no
	Call Number	Admin @ si @ CLB2016			Serial	2762
Permanent link to this record



	Author	Felipe Codevilla; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy
	Title	On Offline Evaluation of Vision-based Driving Models			Type	Conference Article
	Year	2018	Publication	15th European Conference on Computer Vision	Abbreviated Journal
	Volume	11219	Issue		Pages	246-262
	Keywords	Autonomous driving; deep learning
	Abstract	Autonomous driving models should ideally be evaluated by deploying them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics.
	Address	Munich; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCV
	Notes	ADAS; 600.124; 600.118			Approved	no
	Call Number	Admin @ si @ CLK2018			Serial	3162
Permanent link to this record



	Author	Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
	Title	How Much Does Audio Matter to Recognize Egocentric Object Interactions?			Type	Miscellaneous
	Year	2019	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	CoRR abs/1906.00634 Sounds are an important source of information on our daily interactions with objects. For instance, a significant amount of people can discern the temperature of water that it is being poured just by using the sense of hearing. However, only a few works have explored the use of audio for the classification of object interactions in conjunction with vision or as single modality. In this preliminary work, we propose an audio model for egocentric action recognition and explore its usefulness on the parts of the problem (noun, verb, and action classification). Our model achieves a competitive result in terms of verb classification (34.26% accuracy) on a standard benchmark with respect to vision-based state of the art systems, using a comparatively lighter architecture.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no menciona			Approved	no
	Call Number	Admin @ si @ CLR2019			Serial	3383
Permanent link to this record



	Author	Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
	Title	Seeing and Hearing Egocentric Actions: How Much Can We Learn?			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	4470-4480
	Keywords
	Abstract	Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.
	Address	Seul; Korea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ CLR2019b			Serial	3385
Permanent link to this record



	Author	Chee-Kheng Chng; Yuliang Liu; Yipeng Sun; Chun Chet Ng; Canjie Luo; Zihan Ni; ChuanMing Fang; Shuaitao Zhang; Junyu Han; Errui Ding; Jingtuo Liu; Dimosthenis Karatzas; Chee Seng Chan; Lianwen Jin
	Title	ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text – RRC-ArT			Type	Conference Article
	Year	2019	Publication	15th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1571-1576
	Keywords
	Abstract	This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text – RRC-ArT that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 – 82.65%, ii) T2.1 – 74.3%, iii) T2.2 – 85.32%, iv) T3.1 – 53.86%, and v) T3.2 – 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants' methods. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ CLS2019			Serial	3340
Permanent link to this record



	Author	Ciprian Corneanu; Meysam Madadi; Sergio Escalera
	Title	Deep Structure Inference Network for Facial Action Unit Recognition			Type	Conference Article
	Year	2018	Publication	15th European Conference on Computer Vision	Abbreviated Journal
	Volume	11216	Issue		Pages	309-324
	Keywords	Computer Vision; Machine Learning; Deep Learning; Facial Expression Analysis; Facial Action Units; Structure Inference
	Abstract	Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between AUs. We propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively.
	Address	Munich; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCV
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ CME2018			Serial	3205
Permanent link to this record



	Author	Ciprian Corneanu; Meysam Madadi; Sergio Escalera; Aleix M. Martinez
	Title	What does it mean to learn in deep networks? And, how does one detect adversarial attacks?			Type	Conference Article
	Year	2019	Publication	32nd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	4752-4761
	Keywords
	Abstract	The flexibility and high-accuracy of Deep Neural Networks (DNNs) has transformed computer vision. But, the fact that we do not know when a specific DNN will work and when it will fail has resulted in a lack of trust. A clear example is self-driving cars; people are uncomfortable sitting in a car driven by algorithms that may fail under some unknown, unpredictable conditions. Interpretability and explainability approaches attempt to address this by uncovering what a DNN models, i.e., what each node (cell) in the network represents and what images are most likely to activate it. This can be used to generate, for example, adversarial attacks. But these approaches do not generally allow us to determine where a DNN will succeed or fail and why. i.e., does this learned representation generalize to unseen samples? Here, we derive a novel approach to define what it means to learn in deep networks, and how to use this knowledge to detect adversarial attacks. We show how this defines the ability of a network to generalize to unseen testing samples and, most importantly, why this is the case.
	Address	California; June 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	HuPBA; no proj			Approved	no
	Call Number	Admin @ si @ CME2019			Serial	3332
Permanent link to this record



	Author	Ciprian Corneanu; Meysam Madadi; Sergio Escalera; Aleix Martinez
	Title	Explainable Early Stopping for Action Unit Recognition			Type	Conference Article
	Year	2020	Publication	Faces and Gestures in E-health and welfare workshop	Abbreviated Journal
	Volume		Issue		Pages	693-699
	Keywords
	Abstract	A common technique to avoid overfitting when training deep neural networks (DNN) is to monitor the performance in a dedicated validation data partition and to stop training as soon as it saturates. This only focuses on what the model does, while completely ignoring what happens inside it. In this work, we open the “black-box” of DNN in order to perform early stopping. We propose to use a novel theoretical framework that analyses meso-scale patterns in the topology of the functional graph of a network while it trains. Based on it, we decide when it transitions from learning towards overfitting in a more explainable way. We exemplify the benefits of this approach on a state-of-the art custom DNN that jointly learns local representations and label structure employing an ensemble of dedicated subnetworks. We show that it is practically equivalent in performance to early stopping with patience, the standard early stopping algorithm in the literature. This proves beneficial for AU recognition performance and provides new insights into how learning of AUs occurs in DNNs.
	Address	Virtual; November 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	FGW
	Notes	HUPBA;			Approved	no
	Call Number	Admin @ si @ CME2020			Serial	3514
Permanent link to this record



	Author	Victor M. Campello; Carlos Martin-Isla; Cristian Izquierdo; Andrea Guala; Jose F. Rodriguez Palomares; David Vilades; Martin L. Descalzo; Mahir Karakas; Ersin Cavus; Zahra Zahra Raisi-Estabragh; Steffen E. Petersen; Sergio Escalera; Santiago Segui; Karim Lekadir
	Title	Minimising multi-centre radiomics variability through image normalisation: a pilot study			Type	Journal Article
	Year	2022	Publication	Scientific Reports	Abbreviated Journal	ScR
	Volume	12	Issue	1	Pages	12532
	Keywords
	Abstract	Radiomics is an emerging technique for the quantification of imaging data that has recently shown great promise for deeper phenotyping of cardiovascular disease. Thus far, the technique has been mostly applied in single-centre studies. However, one of the main difficulties in multi-centre imaging studies is the inherent variability of image characteristics due to centre differences. In this paper, a comprehensive analysis of radiomics variability under several image- and feature-based normalisation techniques was conducted using a multi-centre cardiovascular magnetic resonance dataset. 218 subjects divided into healthy (n = 112) and hypertrophic cardiomyopathy (n = 106, HCM) groups from five different centres were considered. First and second order texture radiomic features were extracted from three regions of interest, namely the left and right ventricular cavities and the left ventricular myocardium. Two methods were used to assess features’ variability. First, feature distributions were compared across centres to obtain a distribution similarity index. Second, two classification tasks were proposed to assess: (1) the amount of centre-related information encoded in normalised features (centre identification) and (2) the generalisation ability for a classification model when trained on these features (healthy versus HCM classification). The results showed that the feature-based harmonisation technique ComBat is able to remove the variability introduced by centre information from radiomic features, at the expense of slightly degrading classification performance. Piecewise linear histogram matching normalisation gave features with greater generalisation ability for classification ( balanced accuracy in between 0.78 ± 0.08 and 0.79 ± 0.09). Models trained with features from images without normalisation showed the worst performance overall ( balanced accuracy in between 0.45 ± 0.28 and 0.60 ± 0.22). In conclusion, centre-related information removal did not imply good generalisation ability for classification.
	Address	2022/07/22
	Corporate Author				Thesis
	Publisher	Springer Nature	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA			Approved	no
	Call Number	Admin @ si @ CMI2022			Serial	3749
Permanent link to this record



	Author	Felipe Codevilla; Matthias Muller; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy
	Title	End-to-end Driving via Conditional Imitation Learning			Type	Conference Article
	Year	2018	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
	Volume		Issue		Pages	4693 - 4700
	Keywords
	Abstract	Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at this https URL
	Address	Brisbane; Australia; May 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICRA
	Notes	ADAS; 600.116; 600.124; 600.118			Approved	no
	Call Number	Admin @ si @ CML2018			Serial	3108
Permanent link to this record



	Author	R. Clariso; David Masip; A. Rius
	Title	Student projects empowering mobile learning in higher education			Type	Journal
	Year	2014	Publication	Revista de Universidad y Sociedad del Conocimiento	Abbreviated Journal	RUSC
	Volume	11	Issue		Pages	192-207
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1698-580X	ISBN		Medium
	Area		Expedition		Conference
	Notes	OR;MV			Approved	no
	Call Number	Admin @ si @ CMR2014			Serial	2619
Permanent link to this record



	Author	Alejandro Cartas; Juan Marin; Petia Radeva; Mariella Dimiccoli
	Title	Batch-based activity recognition from egocentric photo-streams revisited			Type	Journal Article
	Year	2018	Publication	Pattern Analysis and Applications	Abbreviated Journal	PAA
	Volume	21	Issue	4	Pages	953–965
	Keywords	Egocentric vision; Lifelogging; Activity recognition; Deep learning; Recurrent neural networks
	Abstract	Wearable cameras can gather large amounts of image data that provide rich visual information about the daily activities of the wearer. Motivated by the large number of health applications that could be enabled by the automatic recognition of daily activities, such as lifestyle characterization for habit improvement, context-aware personal assistance and tele-rehabilitation services, we propose a system to classify 21 daily activities from photo-streams acquired by a wearable photo-camera. Our approach combines the advantages of a late fusion ensemble strategy relying on convolutional neural networks at image level with the ability of recurrent neural networks to account for the temporal evolution of high-level features in photo-streams without relying on event boundaries. The proposed batch-based approach achieved an overall accuracy of 89.85%, outperforming state-of-the-art end-to-end methodologies. These results were achieved on a dataset consists of 44,902 egocentric pictures from three persons captured during 26 days in average.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ CMR2018			Serial	3186
Permanent link to this record



	Author	Manuel Carbonell; Joan Mas; Mauricio Villegas; Alicia Fornes; Josep Llados
	Title	End-to-End Handwritten Text Detection and Transcription in Full Pages			Type	Conference Article
	Year	2019	Publication	2nd International Workshop on Machine Learning	Abbreviated Journal
	Volume	5	Issue		Pages	29-34
	Keywords	Handwritten Text Recognition; Layout Analysis; Text segmentation; Deep Neural Networks; Multi-task learning
	Abstract	When transcribing handwritten document images, inaccuracies in the text segmentation step often cause errors in the subsequent transcription step. For this reason, some recent methods propose to perform the recognition at paragraph level. But still, errors in the segmentation of paragraphs can affect the transcription performance. In this work, we propose an end-to-end framework to transcribe full pages. The joint text detection and transcription allows to remove the layout analysis requirement at test time. The experimental results show that our approach can achieve comparable results to models that assume segmented paragraphs, and suggest that joining the two tasks brings an improvement over doing the two tasks separately.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR WML
	Notes	DAG; 600.140; 601.311; 600.140			Approved	no
	Call Number	Admin @ si @ CMV2019			Serial	3353
Permanent link to this record