|   | 
Details
   web
Records
Author (down) Josep Llados; Daniel Lopresti; Seiichi Uchida (eds)
Title 16th International Conference, 2021, Proceedings, Part I Type Book Whole
Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal
Volume 12821 Issue Pages
Keywords
Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: historical document analysis, document analysis systems, handwriting recognition, scene text detection and recognition, document image processing, natural language processing (NLP) for document understanding, and graphics, diagram and math recognition.
Address Lausanne, Switzerland, September 5-10, 2021
Corporate Author Thesis
Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-86548-1 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3725
Permanent link to this record
 

 
Author (down) Josep Llados; Daniel Lopresti; Seiichi Uchida (eds)
Title 16th International Conference, 2021, Proceedings, Part II Type Book Whole
Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal
Volume 12822 Issue Pages
Keywords
Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
Address Lausanne, Switzerland, September 5-10, 2021
Corporate Author Thesis
Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-86330-2 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3726
Permanent link to this record
 

 
Author (down) Josep Llados
Title The 5G of Document Intelligence Type Conference Article
Year 2021 Publication 3rd Workshop on Future of Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference FDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3677
Permanent link to this record
 

 
Author (down) Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez
Title Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches Type Journal Article
Year 2021 Publication Sensors Abbreviated Journal SENS
Volume 21 Issue 9 Pages 3185
Keywords co-training; multi-modality; vision-based object detection; ADAS; self-driving
Abstract Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data-labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on par, at least when using an off-the-shelf depth estimation model not specifically trained on the translated images.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ GVL2021 Serial 3562
Permanent link to this record
 

 
Author (down) Jose Elias Yauri; Aura Hernandez-Sabate; Pau Folch; Debora Gil
Title Mental Workload Detection Based on EEG Analysis Type Conference Article
Year 2021 Publication Artificial Intelligent Research and Development. Proceedings 23rd International Conference of the Catalan Association for Artificial Intelligence. Abbreviated Journal
Volume 339 Issue Pages 268-277
Keywords Cognitive states; Mental workload; EEG analysis; Neural Networks.
Abstract The study of mental workload becomes essential for human work efficiency, health conditions and to avoid accidents, since workload compromises both performance and awareness. Although workload has been widely studied using several physiological measures, minimising the sensor network as much as possible remains both a challenge and a requirement.
Electroencephalogram (EEG) signals have shown a high correlation to specific cognitive and mental states like workload. However, there is not enough evidence in the literature to validate how well models generalize in case of new subjects performing tasks of a workload similar to the ones included during model’s training.
In this paper we propose a binary neural network to classify EEG features across different mental workloads. Two workloads, low and medium, are induced using two variants of the N-Back Test. The proposed model was validated in a dataset collected from 16 subjects and shown a high level of generalization capability: model reported an average recall of 81.81% in a leave-one-out subject evaluation.
Address Virtual; October 20-22 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CCIA
Notes IAM; 600.139; 600.118; 600.145 Approved no
Call Number Admin @ si @ Serial 3723
Permanent link to this record
 

 
Author (down) Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca
Title Camera pose estimation in multi-view environments: From virtual scenarios to the real world Type Journal Article
Year 2021 Publication Image and Vision Computing Abbreviated Journal IVC
Volume 110 Issue Pages 104182
Keywords
Abstract This paper presents a domain adaptation strategy to efficiently train network architectures for estimating the relative camera pose in multi-view scenarios. The network architectures are fed by a pair of simultaneously acquired images, hence in order to improve the accuracy of the solutions, and due to the lack of large datasets with pairs of overlapped images, a domain adaptation strategy is proposed. The domain adaptation strategy consists on transferring the knowledge learned from synthetic images to real-world scenarios. For this, the networks are firstly trained using pairs of synthetic images, which are captured at the same time by a pair of cameras in a virtual environment; and then, the learned weights of the networks are transferred to the real-world case, where the networks are retrained with a few real images. Different virtual 3D scenarios are generated to evaluate the relationship between the accuracy on the result and the similarity between virtual and real scenarios—similarity on both geometry of the objects contained in the scene as well as relative pose between camera and objects in the scene. Experimental results and comparisons are provided showing that the accuracy of all the evaluated networks for estimating the camera pose improves when the proposed domain adaptation strategy is used, highlighting the importance on the similarity between virtual-real scenarios.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU; 600.130; 600.122 Approved no
Call Number Admin @ si @ CSV2021 Serial 3577
Permanent link to this record
 

 
Author (down) Joan Codina-Filba; Sergio Escalera; Joan Escudero; Coen Antens; Pau Buch-Cardona; Mireia Farrus
Title Mobile eHealth Platform for Home Monitoring of Bipolar Disorder Type Conference Article
Year 2021 Publication 27th ACM International Conference on Multimedia Modeling Abbreviated Journal
Volume 12573 Issue Pages 330-341
Keywords
Abstract People suffering Bipolar Disorder (BD) experiment changes in mood status having depressive or manic episodes with normal periods in the middle. BD is a chronic disease with a high level of non-adherence to medication that needs a continuous monitoring of patients to detect when they relapse in an episode, so that physicians can take care of them. Here we present MoodRecord, an easy-to-use, non-intrusive, multilingual, robust and scalable platform suitable for home monitoring patients with BD, that allows physicians and relatives to track the patient state and get alarms when abnormalities occur.

MoodRecord takes advantage of the capabilities of smartphones as a communication and recording device to do a continuous monitoring of patients. It automatically records user activity, and asks the user to answer some questions or to record himself in video, according to a predefined plan designed by physicians. The video is analysed, recognising the mood status from images and bipolar assessment scores are extracted from speech parameters. The data obtained from the different sources are merged periodically to observe if a relapse may start and if so, raise the corresponding alarm. The application got a positive evaluation in a pilot with users from three different countries. During the pilot, the predictions of the voice and image modules showed a coherent correlation with the diagnosis performed by clinicians.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MMM
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ CEE2021 Serial 3659
Permanent link to this record
 

 
Author (down) Jialuo Chen; Mohamed Ali Souibgui; Alicia Fornes; Beata Megyesi
Title Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images Type Conference Article
Year 2021 Publication 4th International Conference on Historical Cryptology Abbreviated Journal
Volume Issue Pages 34-37
Keywords
Abstract Historical ciphers contain a wide range ofsymbols from various symbol sets. Iden-tifying the cipher alphabet is a prerequi-site before decryption can take place andis a time-consuming process. In this workwe explore the use of image processing foridentifying the underlying alphabet in ci-pher images, and to compare alphabets be-tween ciphers. The experiments show thatciphers with similar alphabets can be suc-cessfully discovered through clustering.
Address Virtual; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference HistoCrypt
Notes DAG; 602.230; 600.140; 600.121 Approved no
Call Number Admin @ si @ CSF2021 Serial 3617
Permanent link to this record
 

 
Author (down) Javier Marin; Sergio Escalera
Title SSSGAN: Satellite Style and Structure Generative Adversarial Networks Type Journal Article
Year 2021 Publication Remote Sensing Abbreviated Journal
Volume 13 Issue 19 Pages 3984
Keywords
Abstract This work presents Satellite Style and Structure Generative Adversarial Network (SSGAN), a generative model of high resolution satellite imagery to support image segmentation. Based on spatially adaptive denormalization modules (SPADE) that modulate the activations with respect to segmentation map structure, in addition to global descriptor vectors that capture the semantic information in a vector with respect to Open Street Maps (OSM) classes, this model is able to produce
consistent aerial imagery. By decoupling the generation of aerial images into a structure map and a carefully defined style vector, we were able to improve the realism and geodiversity of the synthesis with respect to the state-of-the-art baseline. Therefore, the proposed model allows us to control the generation not only with respect to the desired structure, but also with respect to a geographic area.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ MaE2021 Serial 3651
Permanent link to this record
 

 
Author (down) Javier M. Olaso; Alain Vazquez; Leila Ben Letaifa; Mikel de Velasco; Aymen Mtibaa; Mohamed Amine Hmani; Dijana Petrovska-Delacretaz; Gerard Chollet; Cesar Montenegro; Asier Lopez-Zorrilla; Raquel Justo; Roberto Santana; Jofre Tenorio-Laranga; Eduardo Gonzalez-Fraile; Begoña Fernandez-Ruanova; Gennaro Cordasco; Anna Esposito; Kristin Beck Gjellesvik; Anna Torp Johansen; Maria Stylianou Kornes; Colin Pickard; Cornelius Glackin; Gary Cahalane; Pau Buch; Cristina Palmero; Sergio Escalera; Olga Gordeeva; Olivier Deroo; Anaïs Fernandez; Daria Kyslitska; Jose Antonio Lozano; Maria Ines Torres; Stephan Schlogl
Title The EMPATHIC Virtual Coach: a demo Type Conference Article
Year 2021 Publication 23rd ACM International Conference on Multimodal Interaction Abbreviated Journal
Volume Issue Pages 848-851
Keywords
Abstract The main objective of the EMPATHIC project has been the design and development of a virtual coach to engage the healthy-senior user and to enhance well-being through awareness of personal status. The EMPATHIC approach addresses this objective through multimodal interactions supported by the GROW coaching model. The paper summarizes the main components of the EMPATHIC Virtual Coach (EMPATHIC-VC) and introduces a demonstration of the coaching sessions in selected scenarios.
Address Virtual; October 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICMI
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ OVB2021 Serial 3644
Permanent link to this record
 

 
Author (down) Javad Zolfaghari Bengar; Joost Van de Weijer; Bartlomiej Twardowski; Bogdan Raducanu
Title Reducing Label Effort: Self- Supervised Meets Active Learning Type Conference Article
Year 2021 Publication International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 1631-1639
Keywords
Abstract Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets. The current work focuses on whether the two paradigms can benefit from each other. We studied object recognition datasets including CIFAR10, CIFAR100 and Tiny ImageNet with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high. The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.
Address October 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes LAMP; Approved no
Call Number Admin @ si @ ZVT2021 Serial 3672
Permanent link to this record
 

 
Author (down) Javad Zolfaghari Bengar; Bogdan Raducanu; Joost Van de Weijer
Title When Deep Learners Change Their Mind: Learning Dynamics for Active Learning Type Conference Article
Year 2021 Publication 19th International Conference on Computer Analysis of Images and Patterns Abbreviated Journal
Volume 13052 Issue 1 Pages 403-413
Keywords
Abstract Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
Address September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CAIP
Notes LAMP; Approved no
Call Number Admin @ si @ ZRV2021 Serial 3673
Permanent link to this record
 

 
Author (down) Javad Zolfaghari Bengar
Title Reducing Label Effort with Deep Active Learning Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition applications, such as image classification, detection and segmentation. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected
informative and/or representative samples. In this thesis we study several aspects of active learning including video object detection for autonomous driving systems, image classification on balanced and imbalanced datasets and the incorporation of self-supervised learning in active learning. We briefly describe our approach in each of these areas to reduce the labeling effort.
In chapter two we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our criterion is based on the estimated number of errors in terms of false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active
learning for video object detection in road scenes. Finally, we show that our
approach outperforms active learning baselines tested on two outdoor datasets.
In the next chapter we address the well-known problem of over confidence in the neural networks. As an alternative to network confidence, we propose a new informativeness-based active learning method that captures the learning dynamics of neural network with a metric called label-dispersion. This metric is low when the network consistently assigns the same label to the sample during the course of training and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
In chapter four, we tackle the problem of sampling bias in active learning methods on imbalanced datasets. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called longtail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we propose a general optimization framework that explicitly takes class-balancing into account. Results on three datasets show that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we show that also on balanced datasets our method generally results in a performance gain.
Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent advancements in self-training have achieved very impressive results rivaling supervised learning on some datasets. In the last chapter we focus on whether active learning and self supervised learning can benefit from each other.
We study object recognition datasets with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high.
Address December 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-9-2 Medium
Area Expedition Conference
Notes LAMP; Approved no
Call Number Admin @ si @ Zol2021 Serial 3609
Permanent link to this record
 

 
Author (down) Idoia Ruiz; Lorenzo Porzi; Samuel Rota Bulo; Peter Kontschieder; Joan Serrat
Title Weakly Supervised Multi-Object Tracking and Segmentation Type Conference Article
Year 2021 Publication IEEE Winter Conference on Applications of Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 125-133
Keywords
Abstract We introduce the problem of weakly supervised MultiObject Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation.
To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by
Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the
objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.
Address Virtual; January 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACVW
Notes ADAS; 600.118; 600.124 Approved no
Call Number Admin @ si @ RPR2021 Serial 3548
Permanent link to this record
 

 
Author (down) Hugo Bertiche; Meysam Madadi; Sergio Escalera
Title Deep Parametric Surfaces for 3D Outfit Reconstruction from Single View Image Type Conference Article
Year 2021 Publication 16th IEEE International Conference on Automatic Face and Gesture Recognition Abbreviated Journal
Volume Issue Pages 1-8
Keywords
Abstract We present a methodology to retrieve analytical surfaces parametrized as a neural network. Previous works on 3D reconstruction yield point clouds, voxelized objects or meshes. Instead, our approach yields 2-manifolds in the euclidean space through deep learning. To this end, we implement a novel formulation for fully connected layers as parametrized manifolds that allows continuous predictions with differential geometry. Based on this property we propose a novel smoothness loss. Results on CLOTH3D++ dataset show the possibility to infer different topologies and the benefits of the smoothness term based on differential geometry.
Address Virtual; December 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference FG
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ BME2021 Serial 3640
Permanent link to this record