|   | 
Details
   web
Records
Author Gabriel Villalonga; Antonio Lopez
Title Co-Training for On-Board Deep Object Detection Type Journal Article
Year 2020 Publication IEEE Access Abbreviated Journal ACCESS
Volume Issue Pages 194441 - 194456
Keywords
Abstract Providing ground truth supervision to train visual models has been a bottleneck over the years, exacerbated by domain shifts which degenerate the performance of such models. This was the case when visual tasks relied on handcrafted features and shallow machine learning and, despite its unprecedented performance gains, the problem remains open within the deep learning paradigm due to its data-hungry nature. Best performing deep vision-based object detectors are trained in a supervised manner by relying on human-labeled bounding boxes which localize class instances (i.e. objects) within the training images. Thus, object detection is one of such tasks for which human labeling is a major bottleneck. In this article, we assess co-training as a semi-supervised learning method for self-labeling objects in unlabeled images, so reducing the human-labeling effort for developing deep object detectors. Our study pays special attention to a scenario involving domain shift; in particular, when we have automatically generated virtual-world images with object bounding boxes and we have real-world images which are unlabeled. Moreover, we are particularly interested in using co-training for deep object detection in the context of driver assistance systems and/or self-driving vehicles. Thus, using well-established datasets and protocols for object detection in these application contexts, we will show how co-training is a paradigm worth to pursue for alleviating object labeling, working both alone and together with task-agnostic domain adaptation.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ ViL2020 Serial 3488
Permanent link to this record
 

 
Author Hannes Mueller; Andre Groger; Jonathan Hersh; Andrea Matranga; Joan Serrat
Title Monitoring War Destruction from Space: A Machine Learning Approach Type Miscellaneous
Year 2020 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Existing data on building destruction in conflict zones rely on eyewitness reports or manual detection, which makes it generally scarce, incomplete and potentially biased. This lack of reliable data imposes severe limitations for media reporting, humanitarian relief efforts, human rights monitoring, reconstruction initiatives, and academic studies of violent conflict. This article introduces an automated method of measuring destruction in high-resolution satellite images using deep learning techniques combined with data augmentation to expand training samples. We apply this method to the Syrian civil war and reconstruct the evolution of damage in major cities across the country. The approach allows generating destruction data with unprecedented scope, resolution, and frequency – only limited by the available satellite imagery – which can alleviate data limitations decisively.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ MGH2020 Serial 3489
Permanent link to this record
 

 
Author Yi Xiao; Felipe Codevilla; Akhil Gurram; Onay Urfalioglu; Antonio Lopez
Title Multimodal end-to-end autonomous driving Type Journal Article
Year 2020 Publication IEEE Transactions on Intelligent Transportation Systems Abbreviated Journal TITS
Volume Issue Pages 1-11
Keywords
Abstract A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g., LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes ADAS Approved no
Call Number Admin @ si @ XCG2020 Serial 3490
Permanent link to this record
 

 
Author Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title Real-time Lexicon-free Scene Text Retrieval Type Journal Article
Year 2021 Publication Pattern Recognition Abbreviated Journal PR
Volume 110 Issue Pages 107656
Keywords
Abstract In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes DAG; 600.121; 600.129; 601.338 Approved no
Call Number Admin @ si @ MTD2021 Serial 3493
Permanent link to this record
 

 
Author Lluis Gomez; Anguelos Nicolaou; Marçal Rusiñol; Dimosthenis Karatzas
Title 12 years of ICDAR Robust Reading Competitions: The evolution of reading systems for unconstrained text understanding Type Book Chapter
Year 2020 Publication Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor K. Alahari; C.V. Jawahar
Language Summary Language Original Title
Series Editor Series Title Series on Advances in Computer Vision and Pattern Recognition Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes DAG; 600.121 Approved no
Call Number GNR2020 Serial 3494
Permanent link to this record
 

 
Author Lluis Gomez; Dena Bazazian; Dimosthenis Karatzas
Title Historical review of scene text detection research Type Book Chapter
Year 2020 Publication Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor K. Alahari; C.V. Jawahar
Language Summary Language Original Title
Series Editor Series Title Series on Advances in Computer Vision and Pattern Recognition Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ GBK2020 Serial 3495
Permanent link to this record
 

 
Author Jon Almazan; Lluis Gomez; Suman Ghosh; Ernest Valveny; Dimosthenis Karatzas
Title WATTS: A common representation of word images and strings using embedded attributes for text recognition and retrieval Type Book Chapter
Year 2020 Publication Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor Analysis”, K. Alahari; C.V. Jawahar
Language Summary Language Original Title
Series Editor Series Title Series on Advances in Computer Vision and Pattern Recognition Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ AGG2020 Serial 3496
Permanent link to this record
 

 
Author Aymen Azaza; Joost Van de Weijer; Ali Douik; Javad Zolfaghari Bengar; Marc Masana
Title Saliency from High-Level Semantic Image Features Type Journal
Year 2020 Publication SN Computer Science Abbreviated Journal SN
Volume 1 Issue 4 Pages 1-12
Keywords
Abstract Top-down semantic information is known to play an important role in assigning saliency. Recently, large strides have been made in improving state-of-the-art semantic image understanding in the fields of object detection and semantic segmentation. Therefore, since these methods have now reached a high-level of maturity, evaluation of the impact of high-level image understanding on saliency estimation is now feasible. We propose several saliency features which are computed from object detection and semantic segmentation results. We combine these features with a standard baseline method for saliency detection to evaluate their importance. Experiments demonstrate that the proposed features derived from object detection and semantic segmentation improve saliency estimation significantly. Moreover, they show that our method obtains state-of-the-art results on (FT, ImgSal, and SOD datasets) and obtains competitive results on four other datasets (ECSSD, PASCAL-S, MSRA-B, and HKU-IS).
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes LAMP; 600.120; 600.109; 600.106 Approved no
Call Number Admin @ si @ AWD2020 Serial 3503
Permanent link to this record
 

 
Author Rahma Kalboussi; Aymen Azaza; Joost Van de Weijer; Mehrez Abdellaoui; Ali Douik
Title Object proposals for salient object segmentation in videos Type Journal Article
Year 2020 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 79 Issue 13 Pages 8677-8693
Keywords
Abstract Salient object segmentation in videos is generally broken up in a video segmentation part and a saliency assignment part. Recently, object proposals, which are used to segment the image, have had significant impact on many computer vision applications, including image segmentation, object detection, and recently saliency detection in still images. However, their usage has not yet been evaluated for salient object segmentation in videos. Therefore, in this paper, we investigate the application of object proposals to salient object segmentation in videos. In addition, we propose a new motion feature derived from the optical flow structure tensor for video saliency detection. Experiments on two standard benchmark datasets for video saliency show that the proposed motion feature improves saliency estimation results, and that object proposals are an efficient method for salient object segmentation. Results on the challenging SegTrack v2 and Fukuchi benchmark data sets show that we significantly outperform the state-of-the-art.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes LAMP; 600.120 Approved no
Call Number KAW2020 Serial 3504
Permanent link to this record
 

 
Author Diana Ramirez Cifuentes; Ana Freire; Ricardo Baeza Yates; Joaquim Punti Vidal; Pilar Medina Bravo; Diego Velazquez; Josep M. Gonfaus; Jordi Gonzalez
Title Detection of Suicidal Ideation on Social Media: Multimodal, Relational, and Behavioral Analysis Type Journal Article
Year 2020 Publication Journal of Medical Internet Research Abbreviated Journal JMIR
Volume 22 Issue 7 Pages e17758
Keywords
Abstract Background:
Suicide risk assessment usually involves an interaction between doctors and patients. However, a significant number of people with mental disorders receive no treatment for their condition due to the limited access to mental health care facilities; the reduced availability of clinicians; the lack of awareness; and stigma, neglect, and discrimination surrounding mental disorders. In contrast, internet access and social media usage have increased significantly, providing experts and patients with a means of communication that may contribute to the development of methods to detect mental health issues among social media users.

Objective:
This paper aimed to describe an approach for the suicide risk assessment of Spanish-speaking users on social media. We aimed to explore behavioral, relational, and multimodal data extracted from multiple social platforms and develop machine learning models to detect users at risk.

Methods:
We characterized users based on their writings, posting patterns, relations with other users, and images posted. We also evaluated statistical and deep learning approaches to handle multimodal data for the detection of users with signs of suicidal ideation (suicidal ideation risk group). Our methods were evaluated over a dataset of 252 users annotated by clinicians. To evaluate the performance of our models, we distinguished 2 control groups: users who make use of suicide-related vocabulary (focused control group) and generic random users (generic control group).

Results:
We identified significant statistical differences between the textual and behavioral attributes of each of the control groups compared with the suicidal ideation risk group. At a 95% CI, when comparing the suicidal ideation risk group and the focused control group, the number of friends (P=.04) and median tweet length (P=.04) were significantly different. The median number of friends for a focused control user (median 578.5) was higher than that for a user at risk (median 372.0). Similarly, the median tweet length was higher for focused control users, with 16 words against 13 words of suicidal ideation risk users. Our findings also show that the combination of textual, visual, relational, and behavioral data outperforms the accuracy of using each modality separately. We defined text-based baseline models based on bag of words and word embeddings, which were outperformed by our models, obtaining an increase in accuracy of up to 8% when distinguishing users at risk from both types of control users.

Conclusions:
The types of attributes analyzed are significant for detecting users at risk, and their combination outperforms the results provided by generic, exclusively text-based baseline models. After evaluating the contribution of image-based predictive models, we believe that our results can be improved by enhancing the models based on textual and relational features. These methods can be extended and applied to different use cases related to other mental disorders.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes ISE; 600.098; 600.119 Approved no
Call Number Admin @ si @ RFB2020 Serial 3552
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Asma Bensalah; Jialuo Chen; Alicia Fornes; Michelle Waldispühl
Title A User Perspective on HTR methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus Just Accepted Type Journal Article
Year 2023 Publication ACM Journal on Computing and Cultural Heritage Abbreviated Journal JOCCH
Volume 15 Issue 4 Pages 1-18
Keywords
Abstract Recent breakthroughs in Artificial Intelligence, Deep Learning and Document Image Analysis and Recognition have significantly eased the creation of digital libraries and the transcription of historical documents. However, for documents in rare scripts with few labelled training data available, current Handwritten Text Recognition (HTR) systems are too constraint. Moreover, research on HTR often focuses on technical aspects only, and rarely puts emphasis on implementing software tools for scholars in Humanities. In this article, we describe, compare and analyse different transcription methods for rare scripts. We evaluate their performance in a real use case of a medieval manuscript written in the runic script (Codex Runicus) and discuss advantages and disadvantages of each method from the user perspective. From this exhaustive analysis and comparison with a fully manual transcription, we raise conclusions and provide recommendations to scholars interested in using automatic transcription tools.
Address
Corporate Author Thesis
Publisher ACM Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no
Call Number Admin @ si @ SBC2023 Serial 3732
Permanent link to this record
 

 
Author Gemma Rotger
Title Lifelike Humans: Detailed Reconstruction of Expressive Human Faces Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Developing human-like digital characters is a challenging task since humans are used to recognizing our fellows, and find the computed generated characters inadequately humanized. To fulfill the standards of the videogame and digital film productions it is necessary to model and animate these characters the most closely to human beings. However, it is an arduous and expensive task, since many artists and specialists are required to work on a single character. Therefore, to fulfill these requirements we found an interesting option to study the automatic creation of detailed characters through inexpensive setups. In this work, we develop novel techniques to bring detailed characters by combining different aspects that stand out when developing realistic characters, skin detail, facial hairs, expressions, and microexpressions. We examine each of the mentioned areas with the aim of automatically recover each of the parts without user interaction nor training data. We study the problems for their robustness but also for the simplicity of the setup, preferring single-image with uncontrolled illumination and methods that can be easily computed with the commodity of a standard laptop. A detailed face with wrinkles and skin details is vital to develop a realistic character. In this work, we introduce our method to automatically describe facial wrinkles on the image and transfer to the recovered base face. Then we advance to facial hair recovery by resolving a fitting problem with a novel parametrization model. As of last, we develop a mapping function that allows transfer expressions and microexpressions between different meshes, which provides realistic animations to our detailed mesh. We cover all the mentioned points with the focus on key aspects as (i) how to describe skin wrinkles in a simple and straightforward manner, (ii) how to recover 3D from 2D detections, (iii) how to recover and model facial hair from 2D to 3D, (iv) how to transfer expressions between models holding both skin detail and facial hair, (v) how to perform all the described actions without training data nor user interaction. In this work, we present our proposals to solve these aspects with an efficient and simple setup. We validate our work with several datasets both synthetic and real data, prooving remarkable results even in challenging cases as occlusions as glasses, thick beards, and indeed working with different face topologies like single-eyed cyclops.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Felipe Lumbreras;Antonio Agudo
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-3-0 Medium
Area Expedition Conference (up)
Notes ADAS Approved no
Call Number Admin @ si @ Rot2021 Serial 3513
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
Title Sign Language Recognition: A Deep Survey Type Journal Article
Year 2021 Publication Expert Systems With Applications Abbreviated Journal ESWA
Volume 164 Issue Pages 113794
Keywords
Abstract Sign language, as a different form of the communication language, is important to large groups of people in society. There are different signs in each sign language with variability in hand shape, motion profile, and position of the hand, face, and body parts contributing to each sign. So, visual sign language recognition is a complex research area in computer vision. Many models have been proposed by different researchers with significant improvement by deep learning approaches in recent years. In this survey, we review the vision-based proposed models of sign language recognition using deep learning approaches from the last five years. While the overall trend of the proposed models indicates a significant improvement in recognition accuracy in sign language recognition, there are some challenges yet that need to be solved. We present a taxonomy to categorize the proposed models for isolated and continuous sign language recognition, discussing applications, datasets, hybrid models, complexity, and future lines of research in the field.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ RKE2021a Serial 3521
Permanent link to this record
 

 
Author Jun Wan; Chi Lin; Longyin Wen; Yunan Li; Qiguang Miao; Sergio Escalera; Gholamreza Anbarjafari; Isabelle Guyon; Guodong Guo; Stan Z. Li
Title ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition Type Journal Article
Year 2022 Publication IEEE Transactions on Cybernetics Abbreviated Journal TCIBERN
Volume 52 Issue 5 Pages 3422-3433
Keywords
Abstract The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.
Address May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes HUPBA; no menciona Approved no
Call Number Admin @ si @ WLW2022 Serial 3522
Permanent link to this record
 

 
Author Ajian Liu; Xuan Li; Jun Wan; Yanyan Liang; Sergio Escalera; Hugo Jair Escalante; Meysam Madadi; Yi Jin; Zhuoyuan Wu; Xiaogang Yu; Zichang Tan; Qi Yuan; Ruikun Yang; Benjia Zhou; Guodong Guo; Stan Z. Li
Title Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review Type Journal Article
Year 2020 Publication IET Biometrics Abbreviated Journal BIO
Volume 10 Issue 1 Pages 24-43
Keywords
Abstract Face anti-spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has %possessed achieved impressive progress recently due the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing. Recently, a multi-ethnic face anti-spoofing dataset, CASIA-SURF CeFA, has been released with the goal of measuring the ethnic bias. It is the largest up to date cross-ethnicity face anti-spoofing dataset covering 3 ethnicities, 3 modalities, 1,607 subjects, 2D plus 3D attack types, and the first dataset including explicit ethnic labels among the recently released datasets for face anti-spoofing. We organized the Chalearn Face Anti-spoofing Attack Detection Challenge which consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks around this novel resource to boost research aiming to alleviate the ethnic bias. Both tracks have attracted 340 teams in the development stage, and finally 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively. All the results were verified and re-ran by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results. We analyze the top ranked solutions and draw conclusions derived from the competition. In addition we outline future work directions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (up)
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ LLW2020b Serial 3523
Permanent link to this record