Publicacions CVC -- Query Results

[31–40] << 41 42 43 44 45 46 47 48 49 50 >> [51–60]

Details

Records
Author	Santiago Segui; Michal Drozdzal; Guillem Pascual; Petia Radeva; Carolina Malagelada; Fernando Azpiroz; Jordi Vitria
Title	Generic Feature Learning for Wireless Capsule Endoscopy Analysis			Type	Journal Article
Year	2016	Publication	Computers in Biology and Medicine	Abbreviated Journal	CBM
Volume	79	Issue		Pages	163-172
Keywords	Wireless capsule endoscopy; Deep learning; Feature learning; Motility analysis
Abstract	The interpretation and analysis of wireless capsule endoscopy (WCE) recordings is a complex task which requires sophisticated computer aided decision (CAD) systems to help physicians with video screening and, finally, with the diagnosis. Most CAD systems used in capsule endoscopy share a common system design, but use very different image and video representations. As a result, each time a new clinical application of WCE appears, a new CAD system has to be designed from the scratch. This makes the design of new CAD systems very time consuming. Therefore, in this paper we introduce a system for small intestine motility characterization, based on Deep Convolutional Neural Networks, which circumvents the laborious step of designing specific features for individual motility events. Experimental results show the superiority of the learned features over alternative classifiers constructed using state-of-the-art handcrafted features. In particular, it reaches a mean classification accuracy of 96% for six intestinal motility events, outperforming the other classifiers by a large margin (a 14% relative performance increase).
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	OR; MILAB;MV;			Approved	no
Call Number	Admin @ si @ SDP2016			Serial	2836
Permanent link to this record



Author	Maryam Asadi-Aghbolaghi; Albert Clapes; Marco Bellantonio; Hugo Jair Escalante; Victor Ponce; Xavier Baro; Isabelle Guyon; Shohreh Kasaei; Sergio Escalera
Title	A survey on deep learning based approaches for action and gesture recognition in image sequences			Type	Conference Article
Year	2017	Publication	12th IEEE International Conference on Automatic Face and Gesture Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The interest in action and gesture recognition has grown considerably in the last years. In this paper, we present a survey on current deep learning methodologies for action and gesture recognition in image sequences. We introduce a taxonomy that summarizes important aspects of deep learning for approaching both tasks. We review the details of the proposed architectures, fusion strategies, main datasets, and competitions. We summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, discussing their main features and identify opportunities and challenges for future research.
Address	Washington; USA; May 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	FG
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ ACB2017b			Serial	2982
Permanent link to this record



Author	Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez
Title	Interpretation of Complex Situations in a Semantic-based Surveillance Framework			Type	Journal
Year	2008	Publication	Signal Processing: Image Communication, Special Issue on Semantic Analysis for Interactive Multimedia Services	Abbreviated Journal
Volume	23	Issue	7	Pages	554-569
Keywords	Cognitive vision system; Situation analysis; Applied ontologies
Abstract	The integration of cognitive capabilities in computer vision systems requires both to enable high semantic expressiveness and to deal with high computational costs as large amounts of data are involved in the analysis. This contribution describes a cognitive vision system conceived to automatically provide high-level interpretations of complex real-time situations in outdoor and indoor scenarios, and to eventually maintain communication with casual end users in multiple languages. The main contributions are: (i) the design of an integrative multilevel architecture for cognitive surveillance purposes; (ii) the proposal of a coherent taxonomy of knowledge to guide the process of interpretation, which leads to the conception of a situation-based ontology; (iii) the use of situational analysis for content detection and a progressive interpretation of semantically rich scenes, by managing incomplete or uncertain knowledge, and (iv) the use of such an ontological background to enable multilingual capabilities and advanced end-user interfaces. Experimental results are provided to show the feasibility of the proposed approach.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	ISE @ ise @ FBR2008			Serial	954
Permanent link to this record



Author	Marc Masana; Bartlomiej Twardowski; Joost Van de Weijer
Title	On Class Orderings for Incremental Learning			Type	Conference Article
Year	2020	Publication	ICML Workshop on Continual Learning	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The influence of class orderings in the evaluation of incremental learning has received very little attention. In this paper, we investigate the impact of class orderings for incrementally learned classifiers. We propose a method to compute various orderings for a dataset. The orderings are derived by simulated annealing optimization from the confusion matrix and reflect different incremental learning scenarios, including maximally and minimally confusing tasks. We evaluate a wide range of state-of-the-art incremental learning methods on the proposed orderings. Results show that orderings can have a significant impact on performance and the ranking of the methods.
Address	Virtual; July 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICMLW
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ MTW2020			Serial	3505
Permanent link to this record



Author	Carles Fernandez; Jordi Gonzalez; Joao Manuel R. S. Taveres; Xavier Roca
Title	Towards Ontological Cognitive System			Type	Book Chapter
Year	2013	Publication	Topics in Medical Image Processing and Computational Vision	Abbreviated Journal
Volume	8	Issue		Pages	87-99
Keywords
Abstract	The increasing ubiquitousness of digital information in our daily lives has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. This raises a series of technological demands for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users.
Address
Corporate Author				Thesis
Publisher	Springer Netherlands	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	2212-9391	ISBN	978-94-007-0725-2	Medium
Area		Expedition		Conference
Notes	ISE; 605.203; 302.018; 600.049			Approved	no
Call Number	Admin @ si @ FGT2013			Serial	2287
Permanent link to this record



Author	Carles Fernandez
Title	Understanding Image Sequences: the Role of Ontologies in Cognitive Vision			Type	Book Whole
Year	2010	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The increasing ubiquitousness of digital information in our daily lives has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. This raises a series of technological demands for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users. In this thesis we tackle the problem of recognizing and describing meaningful events in video sequences from different domains, and communicating the resulting knowledge to end-users by means of advanced interfaces for human–computer interaction. This problem is addressed by designing the high-level modules of a cognitive vision framework exploiting ontological knowledge. Ontologies allow us to define the relevant concepts in a domain and the relationships among them; we prove that the use of ontologies to organize, centralize, link, and reuse different types of knowledge is a key factor in the materialization of our objectives. The proposed framework contributes to: (i) automatically learn the characteristics of different scenarios in a domain; (ii) reason about uncertain, incomplete, or vague information from visual –camera’s– or linguistic –end-user’s– inputs; (iii) derive plausible interpretations of complex events from basic spatiotemporal developments; (iv) facilitate natural interfaces that adapt to the needs of end-users, and allow them to communicate efficiently with the system at different levels of interaction; and finally, (v) find mechanisms to guide modeling processes, maintain and extend the resulting models, and to exploit multimodal resources synergically to enhance the former tasks. We describe a holistic methodology to achieve these goals. First, the use of prior taxonomical knowledge is proved useful to guide MAP-MRF inference processes in the automatic identification of semantic regions, with independence of a particular scenario. Towards the recognition of complex video events, we combine fuzzy metric-temporal reasoning with SGTs, thus assessing high-level interpretations from spatiotemporal data. Here, ontological resources like T–Boxes, onomasticons, or factual databases become useful to derive video indexing and retrieval capabilities, and also to forward highlighted content to smart user interfaces. There, we explore the application of ontologies to discourse analysis and cognitive linguistic principles, or scene augmentation techniques towards advanced communication by means of natural language dialogs and synthetic visualizations. Ontologies become fundamental to coordinate, adapt, and reuse the different modules in the system. The suitability of our ontological framework is demonstrated by a series of applications that especially benefit the field of smart video surveillance, viz. automatic generation of linguistic reports about the content of video sequences in multiple natural languages; content-based filtering and summarization of these reports; dialogue-based interfaces to query and browse video contents; automatic learning of semantic regions in a scenario; and tools to evaluate the performance of components and models in the system, via simulation and augmented reality.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Jordi Gonzalez;Xavier Roca
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-937261-2-6	Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ Fer2010a			Serial	1333
Permanent link to this record



Author	Bojana Gajic; Eduard Vazquez; Ramon Baldrich
Title	Evaluation of Deep Image Descriptors for Texture Retrieval			Type	Conference Article
Year	2017	Publication	Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017)	Abbreviated Journal
Volume		Issue		Pages	251-257
Keywords	Texture Representation; Texture Retrieval; Convolutional Neural Networks; Psychophysical Evaluation
Abstract	The increasing complexity learnt in the layers of a Convolutional Neural Network has proven to be of great help for the task of classification. The topic has received great attention in recently published literature. Nonetheless, just a handful of works study low-level representations, commonly associated with lower layers. In this paper, we explore recent findings which conclude, counterintuitively, the last layer of the VGG convolutional network is the best to describe a low-level property such as texture. To shed some light on this issue, we are proposing a psychophysical experiment to evaluate the adequacy of different layers of the VGG network for texture retrieval. Results obtained suggest that, whereas the last convolutional layer is a good choice for a specific task of classification, it might not be the best choice as a texture descriptor, showing a very poor performance on texture retrieval. Intermediate layers show the best performance, showing a good combination of basic filters, as in the primary visual cortex, and also a degree of higher level information to describe more complex textures.
Address	Porto, Portugal; 27 February – 1 March 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISIGRAPP
Notes	CIC; 600.087			Approved	no
Call Number	Admin @ si @			Serial	3710
Permanent link to this record



Author	Paula Fritzsche; C.Roig; Ana Ripoll; Emilio Luque; Aura Hernandez-Sabate
Title	A Performance Prediction Methodology for Data-dependent Parallel Applications			Type	Conference Article
Year	2006	Publication	Proceedings of the IEEE International Conference on Cluster Computing	Abbreviated Journal
Volume		Issue		Pages	1-8
Keywords
Abstract	The increase in the use of parallel distributed architectures in order to solve large-scale scientific problems has generated the need for performance prediction for both deterministic applications and non-deterministic applications. In particular, the performance prediction of data dependent programs is an extremely challenging problem because for a specific issue the input datasets may cause different execution times. Generally, a parallel application is characterized as a collection of tasks and their interrelations. If the application is time-critical it is not enough to work with only one value per task, and consequently knowledge of the distribution of task execution times is crucial. The development of a new prediction methodology to estimate the performance of data-dependent parallel applications is the primary target of this study. This approach makes it possible to evaluate the parallel performance of an application without the need of implementation. A real data-dependent arterial structure detection application model is used to apply the methodology proposed. The predicted times obtained using the new methodology for genuine datasets are compared with predicted times that arise from using only one execution value per task. Finally, the experimental study shows that the new methodology generates more precise predictions.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	IAM			Approved	no
Call Number	IAM @ iam @ FRR2006			Serial	1497
Permanent link to this record



Author	Eduardo Aguilar; Beatriz Remeseiro; Marc Bolaños; Petia Radeva
Title	Grab, Pay, and Eat: Semantic Food Detection for Smart Restaurants			Type	Journal Article
Year	2018	Publication	IEEE Transactions on Multimedia	Abbreviated Journal
Volume	20	Issue	12	Pages	3266 - 3275
Keywords
Abstract	The increase in awareness of people towards their nutritional habits has drawn considerable attention to the field of automatic food analysis. Focusing on self-service restaurants environment, automatic food analysis is not only useful for extracting nutritional information from foods selected by customers, it is also of high interest to speed up the service solving the bottleneck produced at the cashiers in times of high demand. In this paper, we address the problem of automatic food tray analysis in canteens and restaurants environment, which consists in predicting multiple foods placed on a tray image. We propose a new approach for food analysis based on convolutional neural networks, we name Semantic Food Detection, which integrates in the same framework food localization, recognition and segmentation. We demonstrate that our method improves the state of the art food detection by a considerable margin on the public dataset UNIMIB2016 achieving about 90% in terms of F-measure, and thus provides a significant technological advance towards the automatic billing in restaurant environments.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB; no proj			Approved	no
Call Number	Admin @ si @ ARB2018			Serial	3236
Permanent link to this record



Author	Ivet Rafegas; Maria Vanrell; Luis A Alexandre; G. Arias
Title	Understanding trained CNNs by indexing neuron selectivity			Type	Journal Article
Year	2020	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	136	Issue		Pages	318-325
Keywords
Abstract	The impressive performance of Convolutional Neural Networks (CNNs) when solving different vision problems is shadowed by their black-box nature and our consequent lack of understanding of the representations they build and how these representations are organized. To help understanding these issues, we propose to describe the activity of individual neurons by their Neuron Feature visualization and quantify their inherent selectivity with two specific properties. We explore selectivity indexes for: an image feature (color); and an image label (class membership). Our contribution is a framework to seek or classify neurons by indexing on these selectivity properties. It helps to find color selective neurons, such as a red-mushroom neuron in layer Conv4 or class selective neurons such as dog-face neurons in layer Conv5 in VGG-M, and establishes a methodology to derive other selectivity properties. Indexing on neuron selectivity can statistically draw how features and classes are represented through layers in a moment when the size of trained nets is growing and automatic tools to index neurons can be helpful.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	CIC; 600.087; 600.140; 600.118			Approved	no
Call Number	Admin @ si @ RVL2019			Serial	3310
Permanent link to this record



Author	A.Kesidis; Dimosthenis Karatzas
Title	Logo and Trademark Recognition			Type	Book Chapter
Year	2014	Publication	Handbook of Document Image Processing and Recognition	Abbreviated Journal
Volume	D	Issue		Pages	591-646
Keywords	Logo recognition; Logo removal; Logo spotting; Trademark registration; Trademark retrieval systems
Abstract	The importance of logos and trademarks in nowadays society is indisputable, variably seen under a positive light as a valuable service for consumers or a negative one as a catalyst of ever-increasing consumerism. This chapter discusses the technical approaches for enabling machines to work with logos, looking into the latest methodologies for logo detection, localization, representation, recognition, retrieval, and spotting in a variety of media. This analysis is presented in the context of three different applications covering the complete depth and breadth of state of the art techniques. These are trademark retrieval systems, logo recognition in document images, and logo detection and removal in images and videos. This chapter, due to the very nature of logos and trademarks, brings together various facets of document image analysis spanning graphical and textual content, while it links document image analysis to other computer vision domains, especially when it comes to the analysis of real-scene videos and images.
Address
Corporate Author				Thesis
Publisher	Springer London	Place of Publication		Editor	D. Doermann; K. Tombre
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-0-85729-858-4	Medium
Area		Expedition		Conference
Notes	DAG; 600.077			Approved	no
Call Number	Admin @ si @ KeK2014			Serial	2425
Permanent link to this record



Author	Y. Patel; Lluis Gomez; Raul Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
Title	TextTopicNet-Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces			Type	Miscellaneous
Year	2018	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such datasets requires a tremendous amount of human effort and annotations are limited to popular set of classes. As an alternative, learning visual features by designing auxiliary tasks which make use of freely available self-supervision has become increasingly popular in the computer vision community. In this paper, we put forward an idea to take advantage of multi-modal context to provide self-supervision for the training of computer vision algorithms. We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration. More specifically we use popular text embedding techniques to provide the self-supervision for the training of deep CNN.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.084; 601.338; 600.121			Approved	no
Call Number	Admin @ si @ PGG2018			Serial	3177
Permanent link to this record



Author	Xavier Soria
Title	Single sensor multi-spectral imaging			Type	Book Whole
Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The image sensor, nowadays, is rolling the smartphone industry. While some phone brands explore equipping more image sensors, others, like Google, maintain their smartphones with just one sensor; but this sensor is equipped with Deep Learning to enhance the image quality. However, what all brands agree on is the need to research new image sensors; for instance, in 2015 Omnivision and PixelTeq presented new CMOS based image sensors defined as multispectral Single Sensor Camera (SSC), which are capable of capturing multispectral bands. This dissertation presents the benefits of using a multispectral SSCs that, as aforementioned, simultaneously acquires images in the visible and near-infrared (NIR) bands. The principal benefits while addressing problems related to image bands in the spectral range of 400 to 1100 nanometers, there are cost reductions in the hardware and software setup because only one SSC is needed instead of two, and the images alignment are not required any more. Concerning to the NIR spectrum, many works in literature have proven the benefits of working with NIR to enhance RGB images (e.g., image enhancement, remove shadows, dehazing, etc.). In spite of the advantage of using SSC (e.g., low latency), there are some drawback to be solved. One of this drawback corresponds to the nature of the silicon-based sensor, which in addition to capture the RGB image, when the infrared cut off filter is not installed it also acquires NIR information into the visible image. This phenomenon is called RGB and NIR crosstalking. This thesis firstly faces this problem in challenging images and then it shows the benefit of using multispectral images in the edge detection task. The RGB color restoration from RGBN image is the topic tackled in RGB and NIR crosstalking. Even though in the literature a set of processes have been proposed to face this issue, in this thesis novel approaches, based on DL, are proposed to subtract the additional NIR included in the RGB channel. More precisely, an Artificial Neural Network (NN) and two Convolutional Neural Network (CNN) models are proposed. As the DL based models need a dataset with a large collection of image pairs, a large dataset is collected to address the color restoration. The collected images are from challenging scenes where the sunlight radiation is sufficient to give absorption/reflectance properties to the considered scenes. An extensive evaluation has been conducted on the CNN models, differences from most of the restored images are almost imperceptible to the human eye. The next proposal of the thesis is the validation of the usage of SSC images in the edge detection task. Three methods based on CNN have been proposed. While the first one is based on the most used model, holistically-nested edge detection (HED) termed as multispectral HED (MS-HED), the other two have been proposed observing the drawbacks of MS-HED. These two novel architectures have been designed from scratch (training from scratch); after the first architecture is validated in the visible domain a slight redesign is proposed to tackle the multispectral domain. Again, another dataset is collected to face this problem with SSCs. Even though edge detection is confronted in the multispectral domain, its qualitative and quantitative evaluation demonstrates the generalization in other datasets used for edge detection, improving state-of-the-art results.
Address	September 2019
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Angel Sappa
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-948531-9-7	Medium
Area		Expedition		Conference
Notes	MSIAU; 600.122			Approved	no
Call Number	Admin @ si @ Sor2019			Serial	3391
Permanent link to this record



Author	Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol; Anguelos Nicolaou
Title	The Robust Reading Competition Annotation and Evaluation Platform			Type	Conference Article
Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
Volume		Issue		Pages	61-66
Keywords
Abstract	The ICDAR Robust Reading Competition (RRC), initiated in 2003 and reestablished in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation of data, and to provide online and offline performance evaluation and analysis services.
Address	Viena; Austria; April 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	DAS
Notes	DAG; 600.084; 600.121			Approved	no
Call Number	KGR2018			Serial	3103
Permanent link to this record



Author	Zheng Huang; Kai Chen; Jianhua He; Xiang Bai; Dimosthenis Karatzas; Shijian Lu; CV Jawahar
Title	ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction			Type	Conference Article
Year	2019	Publication	15th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume		Issue		Pages	1516-1520
Keywords
Abstract	The ICDAR 2019 Challenge on “Scanned receipts OCR and key information extraction” (SROIE) covers important aspects related to the automated analysis of scanned receipts. The SROIE tasks play a key role in many document analysis systems and hold significant commercial potential. Although a lot of work has been published over the years on administrative document analysis, the community has advanced relatively slowly, as most datasets have been kept private. One of the key contributions of SROIE to the document analysis community is to offer a first, standardized dataset of 1000 whole scanned receipt images and annotations, as well as an evaluation procedure for such tasks. The Challenge is structured around three tasks, namely Scanned Receipt Text Localization (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). The competition opened on 10th February, 2019 and closed on 5th May, 2019. We received 29, 24 and 18 valid submissions received for the three competition tasks, respectively. This report presents the competition datasets, define the tasks and the evaluation protocols, offer detailed submission statistics, as well as an analysis of the submitted performance. While the tasks of text localization and recognition seem to be relatively easy to tackle, it is interesting to observe the variety of ideas and approaches proposed for the information extraction task. According to the submissions' performance we believe there is still margin for improving information extraction performance, although the current dataset would have to grow substantially in following editions. Given the success of the SROIE competition evidenced by the wide interest generated and the healthy number of submissions from academic, research institutes and industry over different countries, we consider that the SROIE competition can evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.
Address	Sydney; Australia; September 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; 600.129			Approved	no
Call Number	Admin @ si @ HCH2019			Serial	3338
Permanent link to this record