Records |
Author |
Bartlomiej Twardowski; Pawel Zawistowski; Szymon Zaborowski |
Title |
Metric Learning for Session-Based Recommendations |
Type |
Conference Article |
Year |
2021 |
Publication |
43rd edition of the annual BCS-IRSG European Conference on Information Retrieval |
Abbreviated Journal |
|
Volume |
12656 |
Issue |
|
Pages |
650-665 |
Keywords |
Session-based recommendations; Deep metric learning; Learning to rank |
Abstract |
Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study. |
Address |
Virtual; March 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ECIR |
Notes |
LAMP; 600.120 |
Approved |
no |
Call Number |
Admin @ si @ TZZ2021 |
Serial |
3586 |
Permanent link to this record |
|
|
|
Author |
Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas |
Title |
ICDAR 2021 Competition on Document Visual Question Answering |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
635-649 |
Keywords |
|
Abstract |
In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented. |
Address |
VIRTUAL; Lausanne; Suissa; September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICDAR |
Notes |
DAG; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ TMJ2021 |
Serial |
3624 |
Permanent link to this record |
|
|
|
Author |
Ricardo Dario Perez Principi; Cristina Palmero; Julio C. S. Jacques Junior; Sergio Escalera |
Title |
On the Effect of Observed Subject Biases in Apparent Personality Analysis from Audio-visual Signals |
Type |
Journal Article |
Year |
2021 |
Publication |
IEEE Transactions on Affective Computing |
Abbreviated Journal |
TAC |
Volume |
12 |
Issue |
3 |
Pages |
607-621 |
Keywords |
|
Abstract |
Personality perception is implicitly biased due to many subjective factors, such as cultural, social, contextual, gender and appearance. Approaches developed for automatic personality perception are not expected to predict the real personality of the target, but the personality external observers attributed to it. Hence, they have to deal with human bias, inherently transferred to the training data. However, bias analysis in personality computing is an almost unexplored area. In this work, we study different possible sources of bias affecting personality perception, including emotions from facial expressions, attractiveness, age, gender, and ethnicity, as well as their influence on prediction ability for apparent personality estimation. To this end, we propose a multi-modal deep neural network that combines raw audio and visual information alongside predictions of attribute-specific models to regress apparent personality. We also analyse spatio-temporal aggregation schemes and the effect of different time intervals on first impressions. We base our study on the ChaLearn First Impressions dataset, consisting of one-person conversational videos. Our model shows state-of-the-art results regressing apparent personality based on the Big-Five model. Furthermore, given the interpretability nature of our network design, we provide an incremental analysis on the impact of each possible source of bias on final network predictions. |
Address |
1 July-Sept. 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HuPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ PPJ2019 |
Serial |
3312 |
Permanent link to this record |
|
|
|
Author |
Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal |
Title |
DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
Volume |
12823 |
Issue |
|
Pages |
555–568 |
Keywords |
|
Abstract |
Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind. |
Address |
Lausanne; Suissa; September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ BRL2021a |
Serial |
3573 |
Permanent link to this record |
|
|
|
Author |
Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal |
Title |
Graph-Based Deep Generative Modelling for Document Layout Generation |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
Volume |
12917 |
Issue |
|
Pages |
525-537 |
Keywords |
|
Abstract |
One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices. |
Address |
Lausanne; Suissa; September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ BRL2021 |
Serial |
3676 |
Permanent link to this record |
|
|
|
Author |
Fatemeh Noroozi; Ciprian Corneanu; Dorota Kamińska; Tomasz Sapiński; Sergio Escalera; Gholamreza Anbarjafari |
Title |
Survey on Emotional Body Gesture Recognition |
Type |
Journal Article |
Year |
2021 |
Publication |
IEEE Transactions on Affective Computing |
Abbreviated Journal |
TAC |
Volume |
12 |
Issue |
2 |
Pages |
505 - 523 |
Keywords |
|
Abstract |
Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as “body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce, there is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ NCK2021 |
Serial |
3657 |
Permanent link to this record |
|
|
|
Author |
Arturo Fuentes; F. Javier Sanchez; Thomas Voncina; Jorge Bernal |
Title |
LAMV: Learning to Predict Where Spectators Look in Live Music Performances |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
Abbreviated Journal |
|
Volume |
5 |
Issue |
|
Pages |
500-507 |
Keywords |
|
Abstract |
The advent of artificial intelligence has supposed an evolution on how different daily work tasks are performed. The analysis of cultural content has seen a huge boost by the development of computer-assisted methods that allows easy and transparent data access. In our case, we deal with the automation of the production of live shows, like music concerts, aiming to develop a system that can indicate the producer which camera to show based on what each of them is showing. In this context, we consider that is essential to understand where spectators look and what they are interested in so the computational method can learn from this information. The work that we present here shows the results of a first preliminary study in which we compare areas of interest defined by human beings and those indicated by an automatic system. Our system is based on the extraction of motion textures from dynamic Spatio-Temporal Volumes (STV) and then analyzing the patterns by means of texture analysis techniques. We validate our approach over several video sequences that have been labeled by 16 different experts. Our method is able to match those relevant areas identified by the experts, achieving recall scores higher than 80% when a distance of 80 pixels between method and ground truth is considered. Current performance shows promise when detecting abnormal peaks and movement trends. |
Address |
Virtual; February 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
VISIGRAPP |
Notes |
MV; ISE; 600.119; |
Approved |
no |
Call Number |
Admin @ si @ FSV2021 |
Serial |
3570 |
Permanent link to this record |
|
|
|
Author |
Ahmed M. A. Salih; Ilaria Boscolo Galazzo; Zahra Zahra Raisi-Estabragh; Steffen E. Petersen; Polyxeni Gkontra; Karim Lekadir; Gloria Menegaz; Petia Radeva |
Title |
A new scheme for the assessment of the robustness of Explainable Methods Applied to Brain Age estimation |
Type |
Conference Article |
Year |
2021 |
Publication |
34th International Symposium on Computer-Based Medical Systems |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
492-497 |
Keywords |
|
Abstract |
Deep learning methods show great promise in a range of settings including the biomedical field. Explainability of these models is important in these fields for building end-user trust and to facilitate their confident deployment. Although several Machine Learning Interpretability tools have been proposed so far, there is currently no recognized evaluation standard to transfer the explainability results into a quantitative score. Several measures have been proposed as proxies for quantitative assessment of explainability methods. However, the robustness of the list of significant features provided by the explainability methods has not been addressed. In this work, we propose a new proxy for assessing the robustness of the list of significant features provided by two explainability methods. Our validation is defined at functionality-grounded level based on the ranked correlation statistical index and demonstrates its successful application in the framework of brain aging estimation. We assessed our proxy to estimate brain age using neuroscience data. Our results indicate small variability and high robustness in the considered explainability methods using this new proxy. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CBMS |
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ SBZ2021 |
Serial |
3629 |
Permanent link to this record |
|
|
|
Author |
Clementine Decamps; Alexis Arnaud; Florent Petitprez; Mira Ayadi; Aurelia Baures; Lucile Armenoult; Sergio Escalera; Isabelle Guyon; Remy Nicolle; Richard Tomasini; Aurelien de Reynies; Jerome Cros; Yuna Blum; Magali Richard |
Title |
DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification |
Type |
Journal Article |
Year |
2021 |
Publication |
BMC Bioinformatics |
Abbreviated Journal |
|
Volume |
22 |
Issue |
|
Pages |
473 |
Keywords |
|
Abstract |
Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ DAP2021 |
Serial |
3650 |
Permanent link to this record |
|
|
|
Author |
Javad Zolfaghari Bengar; Bogdan Raducanu; Joost Van de Weijer |
Title |
When Deep Learners Change Their Mind: Learning Dynamics for Active Learning |
Type |
Conference Article |
Year |
2021 |
Publication |
19th International Conference on Computer Analysis of Images and Patterns |
Abbreviated Journal |
|
Volume |
13052 |
Issue |
1 |
Pages |
403-413 |
Keywords |
|
Abstract |
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results. |
Address |
September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CAIP |
Notes |
LAMP; |
Approved |
no |
Call Number |
Admin @ si @ ZRV2021 |
Serial |
3673 |
Permanent link to this record |
|
|
|
Author |
Pau Riba; Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados |
Title |
Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
Volume |
12822 |
Issue |
|
Pages |
381–395 |
Keywords |
|
Abstract |
In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work. |
Address |
Lausanne; Suissa; September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICDAR |
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ RMG2021 |
Serial |
3572 |
Permanent link to this record |
|
|
|
Author |
Kaustubh Kulkarni; Ciprian Corneanu; Ikechukwu Ofodile; Sergio Escalera; Xavier Baro; Sylwia Hyniewska; Juri Allik; Gholamreza Anbarjafari |
Title |
Automatic Recognition of Facial Displays of Unfelt Emotions |
Type |
Journal Article |
Year |
2021 |
Publication |
IEEE Transactions on Affective Computing |
Abbreviated Journal |
TAC |
Volume |
12 |
Issue |
2 |
Pages |
377 - 390 |
Keywords |
|
Abstract |
Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average, it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ KCO2021 |
Serial |
3658 |
Permanent link to this record |
|
|
|
Author |
Joan Codina-Filba; Sergio Escalera; Joan Escudero; Coen Antens; Pau Buch-Cardona; Mireia Farrus |
Title |
Mobile eHealth Platform for Home Monitoring of Bipolar Disorder |
Type |
Conference Article |
Year |
2021 |
Publication |
27th ACM International Conference on Multimedia Modeling |
Abbreviated Journal |
|
Volume |
12573 |
Issue |
|
Pages |
330-341 |
Keywords |
|
Abstract |
People suffering Bipolar Disorder (BD) experiment changes in mood status having depressive or manic episodes with normal periods in the middle. BD is a chronic disease with a high level of non-adherence to medication that needs a continuous monitoring of patients to detect when they relapse in an episode, so that physicians can take care of them. Here we present MoodRecord, an easy-to-use, non-intrusive, multilingual, robust and scalable platform suitable for home monitoring patients with BD, that allows physicians and relatives to track the patient state and get alarms when abnormalities occur.
MoodRecord takes advantage of the capabilities of smartphones as a communication and recording device to do a continuous monitoring of patients. It automatically records user activity, and asks the user to answer some questions or to record himself in video, according to a predefined plan designed by physicians. The video is analysed, recognising the mood status from images and bipolar assessment scores are extracted from speech parameters. The data obtained from the different sources are merged periodically to observe if a relapse may start and if so, raise the corresponding alarm. The application got a positive evaluation in a pilot with users from three different countries. During the pilot, the predictions of the voice and image modules showed a coherent correlation with the diagnosis performed by clinicians. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
MMM |
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ CEE2021 |
Serial |
3659 |
Permanent link to this record |
|
|
|
Author |
Adria Molina; Pau Riba; Lluis Gomez; Oriol Ramos Terrades; Josep Llados |
Title |
Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
Volume |
12822 |
Issue |
|
Pages |
306-320 |
Keywords |
|
Abstract |
This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design a neural network that learns a classifier or a regressor, we propose a learning objective based on the nDCG ranking metric. We have experimentally evaluated the performance of the method in two different tasks: date estimation and date-sensitive image retrieval, using the DEW public database, overcoming the baseline methods. |
Address |
Lausanne; Suissa; September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICDAR |
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ MRG2021b |
Serial |
3571 |
Permanent link to this record |
|
|
|
Author |
Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal |
Title |
Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts |
Type |
Journal Article |
Year |
2021 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
Volume |
24 |
Issue |
|
Pages |
269–281 |
Keywords |
|
Abstract |
Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ BRL2021b |
Serial |
3574 |
Permanent link to this record |