|
Josep Llados, Daniel Lopresti, & Seiichi Uchida (Eds.). (2021). 16th International Conference, 2021, Proceedings, Part II (Vol. 12822). LNCS. Springer Cham.
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|
|
Ruben Tito, Dimosthenis Karatzas, & Ernest Valveny. (2021). Document Collection Visual Question Answering. In 16th International Conference on Document Analysis and Recognition (Vol. 12822, pp. 778–792). LNCS.
Abstract: Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.
Keywords: Document collection; Visual Question Answering
|
|
|
Josep Llados, Daniel Lopresti, & Seiichi Uchida (Eds.). (2021). 16th International Conference, 2021, Proceedings, Part I (Vol. 12821). LNCS. Springer Cham.
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: historical document analysis, document analysis systems, handwriting recognition, scene text detection and recognition, document image processing, natural language processing (NLP) for document understanding, and graphics, diagram and math recognition.
|
|
|
Asma Bensalah, Jialuo Chen, Alicia Fornes, Cristina Carmona_Duarte, Josep Llados, & Miguel A. Ferrer. (2020). Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches. In International Workshop on Artificial Intelligence for Healthcare Applications (Vol. 12661, pp. 476–489).
Abstract: Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwatches. We focus on the HAR task, as it is the first part of the assessing pipeline. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale, which are performed in both constrained and unconstrained scenarios. In addition to the application protocol and dataset, we propose two detection and classification baseline methods. We believe that the proposed framework, dataset and baseline results will serve to foster this research field.
|
|
|
Bartlomiej Twardowski, Pawel Zawistowski, & Szymon Zaborowski. (2021). Metric Learning for Session-Based Recommendations. In 43rd edition of the annual BCS-IRSG European Conference on Information Retrieval (Vol. 12656, pp. 650–665). LNCS.
Abstract: Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study.
Keywords: Session-based recommendations; Deep metric learning; Learning to rank
|
|
|
Joan Codina-Filba, Sergio Escalera, Joan Escudero, Coen Antens, Pau Buch-Cardona, & Mireia Farrus. (2021). Mobile eHealth Platform for Home Monitoring of Bipolar Disorder. In 27th ACM International Conference on Multimedia Modeling (Vol. 12573, pp. 330–341). LNCS.
Abstract: People suffering Bipolar Disorder (BD) experiment changes in mood status having depressive or manic episodes with normal periods in the middle. BD is a chronic disease with a high level of non-adherence to medication that needs a continuous monitoring of patients to detect when they relapse in an episode, so that physicians can take care of them. Here we present MoodRecord, an easy-to-use, non-intrusive, multilingual, robust and scalable platform suitable for home monitoring patients with BD, that allows physicians and relatives to track the patient state and get alarms when abnormalities occur.
MoodRecord takes advantage of the capabilities of smartphones as a communication and recording device to do a continuous monitoring of patients. It automatically records user activity, and asks the user to answer some questions or to record himself in video, according to a predefined plan designed by physicians. The video is analysed, recognising the mood status from images and bipolar assessment scores are extracted from speech parameters. The data obtained from the different sources are merged periodically to observe if a relapse may start and if so, raise the corresponding alarm. The application got a positive evaluation in a pilot with users from three different countries. During the pilot, the predictions of the voice and image modules showed a coherent correlation with the diagnosis performed by clinicians.
|
|
|
Tomas Sixta, Julio C. S. Jacques Junior, Pau Buch Cardona, Eduard Vazquez, & Sergio Escalera. (2020). FairFace Challenge at ECCV 2020: Analyzing Bias in Face Recognition. In ECCV Workshops (Vol. 12540, pp. 463–481). LNCS.
Abstract: This work summarizes the 2020 ChaLearn Looking at People Fair Face Recognition and Analysis Challenge and provides a description of the top-winning solutions and analysis of the results. The aim of the challenge was to evaluate accuracy and bias in gender and skin colour of submitted algorithms on the task of 1:1 face verification in the presence of other confounding attributes. Participants were evaluated using an in-the-wild dataset based on reannotated IJB-C, further enriched 12.5K new images and additional labels. The dataset is not balanced, which simulates a real world scenario where AI-based models supposed to present fair outcomes are trained and evaluated on imbalanced data. The challenge attracted 151 participants, who made more 1.8K submissions in total. The final phase of the challenge attracted 36 active teams out of which 10 exceeded 0.999 AUC-ROC while achieving very low scores in the proposed bias metrics. Common strategies by the participants were face pre-processing, homogenization of data distributions, the use of bias aware loss functions and ensemble models. The analysis of top-10 teams shows higher false positive rates (and lower false negative rates) for females with dark skin tone as well as the potential of eyeglasses and young age to increase the false positive rates too.
|
|
|
Martin Menchon, Estefania Talavera, Jose M. Massa, & Petia Radeva. (2020). Behavioural Pattern Discovery from Collections of Egocentric Photo-Streams. In ECCV Workshops (Vol. 12538, pp. 469–484). LNCS.
Abstract: The automatic discovery of behaviour is of high importance when aiming to assess and improve the quality of life of people. Egocentric images offer a rich and objective description of the daily life of the camera wearer. This work proposes a new method to identify a person’s patterns of behaviour from collected egocentric photo-streams. Our model characterizes time-frames based on the context (place, activities and environment objects) that define the images composition. Based on the similarity among the time-frames that describe the collected days for a user, we propose a new unsupervised greedy method to discover the behavioural pattern set based on a novel semantic clustering approach. Moreover, we present a new score metric to evaluate the performance of the proposed algorithm. We validate our method on 104 days and more than 100k images extracted from 7 users. Results show that behavioural patterns can be discovered to characterize the routine of individuals and consequently their lifestyle.
|
|
|
Parichehr Behjati Ardakani, Diego Velazquez, Josep M. Gonfaus, Pau Rodriguez, Xavier Roca, & Jordi Gonzalez. (2019). Catastrophic interference in Disguised Face Recognition. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11868, pp. 64–75). LNCS.
Abstract: It is commonly known the natural tendency of artificial neural networks to completely and abruptly forget previously known information when learning new information. We explore this behaviour in the context of Face Verification on the recently proposed Disguised Faces in the Wild dataset (DFW). We empirically evaluate several commonly used DCNN architectures on Face Recognition and distill some insights about the effect of sequential learning on distinct identities from different datasets, showing that the catastrophic forgetness phenomenon is present even in feature embeddings fine-tuned on different tasks from the original domain.
Keywords: Neural network forgetness; Face recognition; Disguised Faces
|
|
|
Eduardo Aguilar, & Petia Radeva. (2019). Food Recognition by Integrating Local and Flat Classifiers. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11867, pp. 65–74). LNCS.
Abstract: The recognition of food image is an interesting research topic, in which its applicability in the creation of nutritional diaries stands out with the aim of improving the quality of life of people with a chronic disease (e.g. diabetes, heart disease) or prone to acquire it (e.g. people with overweight or obese). For a food recognition system to be useful in real applications, it is necessary to recognize a huge number of different foods. We argue that for very large scale classification, a traditional flat classifier is not enough to acquire an acceptable result. To address this, we propose a method that performs prediction with local classifiers, based on a class hierarchy, or with flat classifier. We decide which approach to use, depending on the analysis of both the Epistemic Uncertainty obtained for the image in the children classifiers and the prediction of the parent classifier. When our criterion is met, the final prediction is obtained with the respective local classifier; otherwise, with the flat classifier. From the results, we can see that the proposed method improves the classification performance compared to the use of a single flat classifier.
|
|
|
Gemma Rotger, Francesc Moreno-Noguer, Felipe Lumbreras, & Antonio Agudo. (2019). Single view facial hair 3D reconstruction. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11867, pp. 423–436). LNCS.
Abstract: n this work, we introduce a novel energy-based framework that addresses the challenging problem of 3D reconstruction of facial hair from a single RGB image. To this end, we identify hair pixels over the image via texture analysis and then determine individual hair fibers that are modeled by means of a parametric hair model based on 3D helixes. We propose to minimize an energy composed of several terms, in order to adapt the hair parameters that better fit the image detections. The final hairs respond to the resulting fibers after a post-processing step where we encourage further realism. The resulting approach generates realistic facial hair fibers from solely an RGB image without assuming any training data nor user interaction. We provide an experimental evaluation on real-world pictures where several facial hair styles and image conditions are observed, showing consistent results and establishing a comparison with respect to competing approaches.
Keywords: 3D Vision; Shape Reconstruction; Facial Hair Modeling
|
|
|
Debora Gil, Antonio Esteban Lansaque, Sebastian Stefaniga, Mihail Gaianu, & Carles Sanchez. (2019). Data Augmentation from Sketch. In International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging (Vol. 11840, pp. 155–162). LNCS.
Abstract: State of the art machine learning methods need huge amounts of data with unambiguous annotations for their training. In the context of medical imaging this is, in general, a very difficult task due to limited access to clinical data, the time required for manual annotations and variability across experts. Simulated data could serve for data augmentation provided that its appearance was comparable to the actual appearance of intra-operative acquisitions. Generative Adversarial Networks (GANs) are a powerful tool for artistic style transfer, but lack a criteria for selecting epochs ensuring also preservation of intra-operative content.
We propose a multi-objective optimization strategy for a selection of cycleGAN epochs ensuring a mapping between virtual images and the intra-operative domain preserving anatomical content. Our approach has been applied to simulate intra-operative bronchoscopic videos and chest CT scans from virtual sketches generated using simple graphical primitives.
Keywords: Data augmentation; cycleGANs; Multi-objective optimization
|
|
|
Eduardo Aguilar, & Petia Radeva. (2019). Class-Conditional Data Augmentation Applied to Image Classification. In 18th International Conference on Computer Analysis of Images and Patterns (Vol. 11679, pp. 182–192). LNCS.
Abstract: Image classification is widely researched in the literature, where models based on Convolutional Neural Networks (CNNs) have provided better results. When data is not enough, CNN models tend to be overfitted. To deal with this, often, traditional techniques of data augmentation are applied, such as: affine transformations, adjusting the color balance, among others. However, we argue that some techniques of data augmentation may be more appropriate for some of the classes. In order to select the techniques that work best for particular class, we propose to explore the epistemic uncertainty for the samples within each class. From our experiments, we can observe that when the data augmentation is applied class-conditionally, we improve the results in terms of accuracy and also reduce the overall epistemic uncertainty. To summarize, in this paper we propose a class-conditional data augmentation procedure that allows us to obtain better results and improve robustness of the classification in the face of model uncertainty.
Keywords: CNNs; Data augmentation; Deep learning; Epistemic uncertainty; Image classification; Food recognition
|
|
|
Estefania Talavera, Nicolai Petkov, & Petia Radeva. (2019). Unsupervised Routine Discovery in Egocentric Photo-Streams. In 18th International Conference on Computer Analysis of Images and Patterns (Vol. 11678, pp. 576–588). LNCS.
Abstract: The routine of a person is defined by the occurrence of activities throughout different days, and can directly affect the person’s health. In this work, we address the recognition of routine related days. To do so, we rely on egocentric images, which are recorded by a wearable camera and allow to monitor the life of the user from a first-person view perspective. We propose an unsupervised model that identifies routine related days, following an outlier detection approach. We test the proposed framework over a total of 72 days in the form of photo-streams covering around 2 weeks of the life of 5 different camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted F-Score for all the users. Thus, we show that our framework is able to recognise routine related days and opens the door to the understanding of the behaviour of people.
Keywords: Routine discovery; Lifestyle; Egocentric vision; Behaviour analysis
|
|
|
Victoria Ruiz, Angel Sanchez, Jose F. Velez, & Bogdan Raducanu. (2019). Automatic Image-Based Waste Classification. In International Work-Conference on the Interplay Between Natural and Artificial Computation. From Bioinspired Systems and Biomedical Applications to Machine Learning (Vol. 11487, 422–431). LNCS.
Abstract: The management of solid waste in large urban environments has become a complex problem due to increasing amount of waste generated every day by citizens and companies. Current Computer Vision and Deep Learning techniques can help in the automatic detection and classification of waste types for further recycling tasks. In this work, we use the TrashNet dataset to train and compare different deep learning architectures for automatic classification of garbage types. In particular, several Convolutional Neural Networks (CNN) architectures were compared: VGG, Inception and ResNet. The best classification results were obtained using a combined Inception-ResNet model that achieved 88.6% of accuracy. These are the best results obtained with the considered dataset.
Keywords: Computer Vision; Deep learning; Convolutional neural networks; Waste classification
|
|