|   | 
Details
   web
Records
Author Bartlomiej Twardowski; Pawel Zawistowski; Szymon Zaborowski
Title Metric Learning for Session-Based Recommendations Type Conference Article
Year 2021 Publication 43rd edition of the annual BCS-IRSG European Conference on Information Retrieval Abbreviated Journal
Volume 12656 Issue Pages (down) 650-665
Keywords Session-based recommendations; Deep metric learning; Learning to rank
Abstract Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study.
Address Virtual; March 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECIR
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ TZZ2021 Serial 3586
Permanent link to this record
 

 
Author Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
Title ICDAR 2021 Competition on Document Visual Question Answering Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages (down) 635-649
Keywords
Abstract In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.
Address VIRTUAL; Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ TMJ2021 Serial 3624
Permanent link to this record
 

 
Author Ricardo Dario Perez Principi; Cristina Palmero; Julio C. S. Jacques Junior; Sergio Escalera
Title On the Effect of Observed Subject Biases in Apparent Personality Analysis from Audio-visual Signals Type Journal Article
Year 2021 Publication IEEE Transactions on Affective Computing Abbreviated Journal TAC
Volume 12 Issue 3 Pages (down) 607-621
Keywords
Abstract Personality perception is implicitly biased due to many subjective factors, such as cultural, social, contextual, gender and appearance. Approaches developed for automatic personality perception are not expected to predict the real personality of the target, but the personality external observers attributed to it. Hence, they have to deal with human bias, inherently transferred to the training data. However, bias analysis in personality computing is an almost unexplored area. In this work, we study different possible sources of bias affecting personality perception, including emotions from facial expressions, attractiveness, age, gender, and ethnicity, as well as their influence on prediction ability for apparent personality estimation. To this end, we propose a multi-modal deep neural network that combines raw audio and visual information alongside predictions of attribute-specific models to regress apparent personality. We also analyse spatio-temporal aggregation schemes and the effect of different time intervals on first impressions. We base our study on the ChaLearn First Impressions dataset, consisting of one-person conversational videos. Our model shows state-of-the-art results regressing apparent personality based on the Big-Five model. Furthermore, given the interpretability nature of our network design, we provide an incremental analysis on the impact of each possible source of bias on final network predictions.
Address 1 July-Sept. 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ PPJ2019 Serial 3312
Permanent link to this record
 

 
Author Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 12823 Issue Pages (down) 555–568
Keywords
Abstract Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind.
Address Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ BRL2021a Serial 3573
Permanent link to this record
 

 
Author Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title Graph-Based Deep Generative Modelling for Document Layout Generation Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 12917 Issue Pages (down) 525-537
Keywords
Abstract One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.
Address Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ BRL2021 Serial 3676
Permanent link to this record
 

 
Author Fatemeh Noroozi; Ciprian Corneanu; Dorota Kamińska; Tomasz Sapiński; Sergio Escalera; Gholamreza Anbarjafari
Title Survey on Emotional Body Gesture Recognition Type Journal Article
Year 2021 Publication IEEE Transactions on Affective Computing Abbreviated Journal TAC
Volume 12 Issue 2 Pages (down) 505 - 523
Keywords
Abstract Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as “body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce, there is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ NCK2021 Serial 3657
Permanent link to this record
 

 
Author Arturo Fuentes; F. Javier Sanchez; Thomas Voncina; Jorge Bernal
Title LAMV: Learning to Predict Where Spectators Look in Live Music Performances Type Conference Article
Year 2021 Publication 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal
Volume 5 Issue Pages (down) 500-507
Keywords
Abstract The advent of artificial intelligence has supposed an evolution on how different daily work tasks are performed. The analysis of cultural content has seen a huge boost by the development of computer-assisted methods that allows easy and transparent data access. In our case, we deal with the automation of the production of live shows, like music concerts, aiming to develop a system that can indicate the producer which camera to show based on what each of them is showing. In this context, we consider that is essential to understand where spectators look and what they are interested in so the computational method can learn from this information. The work that we present here shows the results of a first preliminary study in which we compare areas of interest defined by human beings and those indicated by an automatic system. Our system is based on the extraction of motion textures from dynamic Spatio-Temporal Volumes (STV) and then analyzing the patterns by means of texture analysis techniques. We validate our approach over several video sequences that have been labeled by 16 different experts. Our method is able to match those relevant areas identified by the experts, achieving recall scores higher than 80% when a distance of 80 pixels between method and ground truth is considered. Current performance shows promise when detecting abnormal peaks and movement trends.
Address Virtual; February 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISIGRAPP
Notes MV; ISE; 600.119; Approved no
Call Number Admin @ si @ FSV2021 Serial 3570
Permanent link to this record
 

 
Author Ahmed M. A. Salih; Ilaria Boscolo Galazzo; Zahra Zahra Raisi-Estabragh; Steffen E. Petersen; Polyxeni Gkontra; Karim Lekadir; Gloria Menegaz; Petia Radeva
Title A new scheme for the assessment of the robustness of Explainable Methods Applied to Brain Age estimation Type Conference Article
Year 2021 Publication 34th International Symposium on Computer-Based Medical Systems Abbreviated Journal
Volume Issue Pages (down) 492-497
Keywords
Abstract Deep learning methods show great promise in a range of settings including the biomedical field. Explainability of these models is important in these fields for building end-user trust and to facilitate their confident deployment. Although several Machine Learning Interpretability tools have been proposed so far, there is currently no recognized evaluation standard to transfer the explainability results into a quantitative score. Several measures have been proposed as proxies for quantitative assessment of explainability methods. However, the robustness of the list of significant features provided by the explainability methods has not been addressed. In this work, we propose a new proxy for assessing the robustness of the list of significant features provided by two explainability methods. Our validation is defined at functionality-grounded level based on the ranked correlation statistical index and demonstrates its successful application in the framework of brain aging estimation. We assessed our proxy to estimate brain age using neuroscience data. Our results indicate small variability and high robustness in the considered explainability methods using this new proxy.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CBMS
Notes MILAB; no proj Approved no
Call Number Admin @ si @ SBZ2021 Serial 3629
Permanent link to this record
 

 
Author Clementine Decamps; Alexis Arnaud; Florent Petitprez; Mira Ayadi; Aurelia Baures; Lucile Armenoult; Sergio Escalera; Isabelle Guyon; Remy Nicolle; Richard Tomasini; Aurelien de Reynies; Jerome Cros; Yuna Blum; Magali Richard
Title DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification Type Journal Article
Year 2021 Publication BMC Bioinformatics Abbreviated Journal
Volume 22 Issue Pages (down) 473
Keywords
Abstract Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ DAP2021 Serial 3650
Permanent link to this record
 

 
Author Javad Zolfaghari Bengar; Bogdan Raducanu; Joost Van de Weijer
Title When Deep Learners Change Their Mind: Learning Dynamics for Active Learning Type Conference Article
Year 2021 Publication 19th International Conference on Computer Analysis of Images and Patterns Abbreviated Journal
Volume 13052 Issue 1 Pages (down) 403-413
Keywords
Abstract Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
Address September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CAIP
Notes LAMP; Approved no
Call Number Admin @ si @ ZRV2021 Serial 3673
Permanent link to this record
 

 
Author Pau Riba; Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados
Title Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 12822 Issue Pages (down) 381–395
Keywords
Abstract In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.
Address Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ RMG2021 Serial 3572
Permanent link to this record
 

 
Author Kaustubh Kulkarni; Ciprian Corneanu; Ikechukwu Ofodile; Sergio Escalera; Xavier Baro; Sylwia Hyniewska; Juri Allik; Gholamreza Anbarjafari
Title Automatic Recognition of Facial Displays of Unfelt Emotions Type Journal Article
Year 2021 Publication IEEE Transactions on Affective Computing Abbreviated Journal TAC
Volume 12 Issue 2 Pages (down) 377 - 390
Keywords
Abstract Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average, it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ KCO2021 Serial 3658
Permanent link to this record
 

 
Author Joan Codina-Filba; Sergio Escalera; Joan Escudero; Coen Antens; Pau Buch-Cardona; Mireia Farrus
Title Mobile eHealth Platform for Home Monitoring of Bipolar Disorder Type Conference Article
Year 2021 Publication 27th ACM International Conference on Multimedia Modeling Abbreviated Journal
Volume 12573 Issue Pages (down) 330-341
Keywords
Abstract People suffering Bipolar Disorder (BD) experiment changes in mood status having depressive or manic episodes with normal periods in the middle. BD is a chronic disease with a high level of non-adherence to medication that needs a continuous monitoring of patients to detect when they relapse in an episode, so that physicians can take care of them. Here we present MoodRecord, an easy-to-use, non-intrusive, multilingual, robust and scalable platform suitable for home monitoring patients with BD, that allows physicians and relatives to track the patient state and get alarms when abnormalities occur.

MoodRecord takes advantage of the capabilities of smartphones as a communication and recording device to do a continuous monitoring of patients. It automatically records user activity, and asks the user to answer some questions or to record himself in video, according to a predefined plan designed by physicians. The video is analysed, recognising the mood status from images and bipolar assessment scores are extracted from speech parameters. The data obtained from the different sources are merged periodically to observe if a relapse may start and if so, raise the corresponding alarm. The application got a positive evaluation in a pilot with users from three different countries. During the pilot, the predictions of the voice and image modules showed a coherent correlation with the diagnosis performed by clinicians.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MMM
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ CEE2021 Serial 3659
Permanent link to this record
 

 
Author Adria Molina; Pau Riba; Lluis Gomez; Oriol Ramos Terrades; Josep Llados
Title Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 12822 Issue Pages (down) 306-320
Keywords
Abstract This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design a neural network that learns a classifier or a regressor, we propose a learning objective based on the nDCG ranking metric. We have experimentally evaluated the performance of the method in two different tasks: date estimation and date-sensitive image retrieval, using the DEW public database, overcoming the baseline methods.
Address Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ MRG2021b Serial 3571
Permanent link to this record
 

 
Author Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts Type Journal Article
Year 2021 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 24 Issue Pages (down) 269–281
Keywords
Abstract Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ BRL2021b Serial 3574
Permanent link to this record