toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author David Berga; Xavier Otazu edit  doi
openurl 
  Title A neurodynamic model of saliency prediction in v1 Type Journal Article
  Year 2022 Publication Neural Computation Abbreviated Journal (up) NEURALCOMPUT  
  Volume 34 Issue 2 Pages 378-414  
  Keywords  
  Abstract Lateral connections in the primary visual cortex (V1) have long been hypothesized to be responsible for several visual processing mechanisms such as brightness induction, chromatic induction, visual discomfort, and bottom-up visual attention (also named saliency). Many computational models have been developed to independently predict these and other visual processes, but no computational model has been able to reproduce all of them simultaneously. In this work, we show that a biologically plausible computational model of lateral interactions of V1 is able to simultaneously predict saliency and all the aforementioned visual processes. Our model's architecture (NSWAM) is based on Penacchio's neurodynamic model of lateral connections of V1. It is defined as a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation, and scale. We tested NSWAM saliency predictions using images from several eye tracking data sets. We show that the accuracy of predictions obtained by our architecture, using shuffled metrics, is similar to other state-of-the-art computational methods, particularly with synthetic images (CAT2000-Pattern and SID4VAM) that mainly contain low-level features. Moreover, we outperform other biologically inspired saliency models that are specifically designed to exclusively reproduce saliency. We show that our biologically plausible model of lateral connections can simultaneously explain different visual processes present in V1 (without applying any type of training or optimization and keeping the same parameterization for all the visual processes). This can be useful for the definition of a unified architecture of the primary visual cortex.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT; 600.128; 600.120 Approved no  
  Call Number Admin @ si @ BeO2022 Serial 3696  
Permanent link to this record
 

 
Author Zhen Xu; Sergio Escalera; Adrien Pavao; Magali Richard; Wei-Wei Tu; Quanming Yao; Huan Zhao; Isabelle Guyon edit  doi
openurl 
  Title Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform Type Journal Article
  Year 2022 Publication Patterns Abbreviated Journal (up) PATTERNS  
  Volume 3 Issue 7 Pages 100543  
  Keywords Machine learning; data science; benchmark platform; reproducibility; competitions  
  Abstract Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.  
  Address June 24, 2022  
  Corporate Author Thesis  
  Publisher Science Direct Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA Approved no  
  Call Number Admin @ si @ XEP2022 Serial 3764  
Permanent link to this record
 

 
Author Penny Tarling; Mauricio Cantor; Albert Clapes; Sergio Escalera edit  doi
openurl 
  Title Deep learning with self-supervision and uncertainty regularization to count fish in underwater images Type Journal Article
  Year 2022 Publication PloS One Abbreviated Journal (up) Plos  
  Volume 17 Issue 5 Pages e0267759  
  Keywords  
  Abstract Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data.  
  Address  
  Corporate Author Thesis  
  Publisher Public Library of Science Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA Approved no  
  Call Number Admin @ si @ TCC2022 Serial 3743  
Permanent link to this record
 

 
Author Pau Riba; Lutz Goldmann; Oriol Ramos Terrades; Diede Rusticus; Alicia Fornes; Josep Llados edit  doi
openurl 
  Title Table detection in business document images by message passing networks Type Journal Article
  Year 2022 Publication Pattern Recognition Abbreviated Journal (up) PR  
  Volume 127 Issue Pages 108641  
  Keywords  
  Abstract Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.  
  Address July 2022  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.162; 600.121 Approved no  
  Call Number Admin @ si @ RGR2022 Serial 3729  
Permanent link to this record
 

 
Author Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas edit   file
url  doi
openurl 
  Title Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition Type Journal Article
  Year 2022 Publication Pattern Recognition Abbreviated Journal (up) PR  
  Volume 129 Issue Pages 108766  
  Keywords  
  Abstract The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios.  
  Address Sept. 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121; 600.162 Approved no  
  Call Number Admin @ si @ KRR2022 Serial 3556  
Permanent link to this record
 

 
Author S.K. Jemni; Mohamed Ali Souibgui; Yousri Kessentini; Alicia Fornes edit  url
openurl 
  Title Enhance to Read Better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement Type Journal Article
  Year 2022 Publication Pattern Recognition Abbreviated Journal (up) PR  
  Volume 123 Issue Pages 108370  
  Keywords  
  Abstract Handwritten document images can be highly affected by degradation for different reasons: Paper ageing, daily-life scenarios (wrinkles, dust, etc.), bad scanning process and so on. These artifacts raise many readability issues for current Handwritten Text Recognition (HTR) algorithms and severely devalue their efficiency. In this paper, we propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover the degraded documents into a and form. Unlike the most well-known document binarization methods, which try to improve the visual quality of the degraded document, the proposed architecture integrates a handwritten text recognizer that promotes the generated document image to be more readable. To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents. Extensive experiments conducted on degraded Arabic and Latin handwritten documents demonstrate the usefulness of integrating the recognizer within the GAN architecture, which improves both the visual quality and the readability of the degraded document images. Moreover, we outperform the state of the art in H-DIBCO challenges, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images, on this task.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.124; 600.121; 602.230 Approved no  
  Call Number Admin @ si @ JSK2022 Serial 3613  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Alicia Fornes; Yousri Kessentini; Beata Megyesi edit  doi
openurl 
  Title Few shots are all you need: A progressive learning approach for low resource handwritten text recognition Type Journal Article
  Year 2022 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL  
  Volume 160 Issue Pages 43-49  
  Keywords  
  Abstract Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. In this paper, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human annotation process, by requiring only a few images of each alphabet symbols. The method consists of detecting all the symbols of a given alphabet in a textline image and decoding the obtained similarity scores to the final sequence of transcribed symbols. Our model is first pretrained on synthetic line images generated from an alphabet, which could differ from the alphabet of the target domain. A second training step is then applied to reduce the gap between the source and the target data. Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach that automatically assigns pseudo-labels to the unlabeled data. The evaluation on different datasets shows that our model can lead to competitive results with a significant reduction in human effort. The code will be publicly available in the following repository: https://github.com/dali92002/HTRbyMatching  
  Address  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121; 600.162; 602.230 Approved no  
  Call Number Admin @ si @ SFK2022 Serial 3736  
Permanent link to this record
 

 
Author Victor M. Campello; Carlos Martin-Isla; Cristian Izquierdo; Andrea Guala; Jose F. Rodriguez Palomares; David Vilades; Martin L. Descalzo; Mahir Karakas; Ersin Cavus; Zahra Zahra Raisi-Estabragh; Steffen E. Petersen; Sergio Escalera; Santiago Segui; Karim Lekadir edit  doi
openurl 
  Title Minimising multi-centre radiomics variability through image normalisation: a pilot study Type Journal Article
  Year 2022 Publication Scientific Reports Abbreviated Journal (up) ScR  
  Volume 12 Issue 1 Pages 12532  
  Keywords  
  Abstract Radiomics is an emerging technique for the quantification of imaging data that has recently shown great promise for deeper phenotyping of cardiovascular disease. Thus far, the technique has been mostly applied in single-centre studies. However, one of the main difficulties in multi-centre imaging studies is the inherent variability of image characteristics due to centre differences. In this paper, a comprehensive analysis of radiomics variability under several image- and feature-based normalisation techniques was conducted using a multi-centre cardiovascular magnetic resonance dataset. 218 subjects divided into healthy (n = 112) and hypertrophic cardiomyopathy (n = 106, HCM) groups from five different centres were considered. First and second order texture radiomic features were extracted from three regions of interest, namely the left and right ventricular cavities and the left ventricular myocardium. Two methods were used to assess features’ variability. First, feature distributions were compared across centres to obtain a distribution similarity index. Second, two classification tasks were proposed to assess: (1) the amount of centre-related information encoded in normalised features (centre identification) and (2) the generalisation ability for a classification model when trained on these features (healthy versus HCM classification). The results showed that the feature-based harmonisation technique ComBat is able to remove the variability introduced by centre information from radiomic features, at the expense of slightly degrading classification performance. Piecewise linear histogram matching normalisation gave features with greater generalisation ability for classification ( balanced accuracy in between 0.78 ± 0.08 and 0.79 ± 0.09). Models trained with features from images without normalisation showed the worst performance overall ( balanced accuracy in between 0.45 ± 0.28 and 0.60 ± 0.22). In conclusion, centre-related information removal did not imply good generalisation ability for classification.  
  Address 2022/07/22  
  Corporate Author Thesis  
  Publisher Springer Nature Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA Approved no  
  Call Number Admin @ si @ CMI2022 Serial 3749  
Permanent link to this record
 

 
Author Idoia Ruiz; Joan Serrat edit  doi
openurl 
  Title Hierarchical Novelty Detection for Traffic Sign Recognition Type Journal Article
  Year 2022 Publication Sensors Abbreviated Journal (up) SENS  
  Volume 22 Issue 12 Pages 4389  
  Keywords Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision  
  Abstract Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.154 Approved no  
  Call Number Admin @ si @ RuS2022 Serial 3684  
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud edit   pdf
doi  openurl
  Title A Novel Domain Transfer-Based Approach for Unsupervised Thermal Image Super-Resolution Type Journal Article
  Year 2022 Publication Sensors Abbreviated Journal (up) SENS  
  Volume 22 Issue 6 Pages 2254  
  Keywords Thermal image super-resolution; unsupervised super-resolution; thermal images; attention module; semiregistered thermal images  
  Abstract This paper presents a transfer domain strategy to tackle the limitations of low-resolution thermal sensors and generate higher-resolution images of reasonable quality. The proposed technique employs a CycleGAN architecture and uses a ResNet as an encoder in the generator along with an attention module and a novel loss function. The network is trained on a multi-resolution thermal image dataset acquired with three different thermal sensors. Results report better performance benchmarking results on the 2nd CVPR-PBVS-2021 thermal image super-resolution challenge than state-of-the-art methods. The code of this work is available online.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU; Approved no  
  Call Number Admin @ si @ RSV2022b Serial 3688  
Permanent link to this record
 

 
Author Saad Minhas; Zeba Khanam; Shoaib Ehsan; Klaus McDonald Maier; Aura Hernandez-Sabate edit  doi
openurl 
  Title Weather Classification by Utilizing Synthetic Data Type Journal Article
  Year 2022 Publication Sensors Abbreviated Journal (up) SENS  
  Volume 22 Issue 9 Pages 3193  
  Keywords Weather classification; synthetic data; dataset; autonomous car; computer vision; advanced driver assistance systems; deep learning; intelligent transportation systems  
  Abstract Weather prediction from real-world images can be termed a complex task when targeting classification using neural networks. Moreover, the number of images throughout the available datasets can contain a huge amount of variance when comparing locations with the weather those images are representing. In this article, the capabilities of a custom built driver simulator are explored specifically to simulate a wide range of weather conditions. Moreover, the performance of a new synthetic dataset generated by the above simulator is also assessed. The results indicate that the use of synthetic datasets in conjunction with real-world datasets can increase the training efficiency of the CNNs by as much as 74%. The article paves a way forward to tackle the persistent problem of bias in vision-based datasets.  
  Address 21 April 2022  
  Corporate Author Thesis  
  Publisher MDPI Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM; 600.139; 600.159; 600.166; 600.145; Approved no  
  Call Number Admin @ si @ MKE2022 Serial 3761  
Permanent link to this record
 

 
Author Hugo Jair Escalante; Heysem Kaya; Albert Ali Salah; Sergio Escalera; Yagmur Gucluturk; Umut Guçlu; Xavier Baro; Isabelle Guyon; Julio C. S. Jacques Junior; Meysam Madadi; Stephane Ayache; Evelyne Viegas; Furkan Gurpinar; Achmadnoer Sukma Wicaksana; Cynthia Liem; Marcel A. J. Van Gerven; Rob Van Lier edit   pdf
url  doi
openurl 
  Title Modeling, Recognizing, and Explaining Apparent Personality from Videos Type Journal Article
  Year 2022 Publication IEEE Transactions on Affective Computing Abbreviated Journal (up) TAC  
  Volume 13 Issue 2 Pages 894-911  
  Keywords  
  Abstract Explainability and interpretability are two critical aspects of decision support systems. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of apparent personality recognition. To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail. Finally, derived from our study, we outline research opportunities that we foresee will be relevant in this area in the near future.  
  Address 1 April-June 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ EKS2022 Serial 3406  
Permanent link to this record
 

 
Author Julio C. S. Jacques Junior; Yagmur Gucluturk; Marc Perez; Umut Guçlu; Carlos Andujar; Xavier Baro; Hugo Jair Escalante; Isabelle Guyon; Marcel A. J. van Gerven; Rob van Lier; Sergio Escalera edit  doi
openurl 
  Title First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis Type Journal Article
  Year 2022 Publication IEEE Transactions on Affective Computing Abbreviated Journal (up) TAC  
  Volume 13 Issue 1 Pages 75-95  
  Keywords Personality computing; first impressions; person perception; big-five; subjective bias; computer vision; machine learning; nonverbal signals; facial expression; gesture; speech analysis; multi-modal recognition  
  Abstract Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.  
  Address 1 Jan.-March 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA Approved no  
  Call Number Admin @ si @ JGP2022 Serial 3724  
Permanent link to this record
 

 
Author Jun Wan; Chi Lin; Longyin Wen; Yunan Li; Qiguang Miao; Sergio Escalera; Gholamreza Anbarjafari; Isabelle Guyon; Guodong Guo; Stan Z. Li edit   pdf
url  doi
openurl 
  Title ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition Type Journal Article
  Year 2022 Publication IEEE Transactions on Cybernetics Abbreviated Journal (up) TCIBERN  
  Volume 52 Issue 5 Pages 3422-3433  
  Keywords  
  Abstract The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.  
  Address May 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ WLW2022 Serial 3522  
Permanent link to this record
 

 
Author Ajian Liu; Chenxu Zhao; Zitong Yu; Jun Wan; Anyang Su; Xing Liu; Zichang Tan; Sergio Escalera; Junliang Xing; Yanyan Liang; Guodong Guo; Zhen Lei; Stan Z. Li; Shenshen Du edit  doi
openurl 
  Title Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face Presentation Attack Detection Type Journal Article
  Year 2022 Publication IEEE Transactions on Information Forensics and Security Abbreviated Journal (up) TIForensicSEC  
  Volume 17 Issue Pages 2497 - 2507  
  Keywords  
  Abstract Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achieved acceptable performance on these benchmarks but still far from the needs of practical scenarios. To bridge the gap to real-world applications, we introduce a large-scale Hi gh- Fi delity Mask dataset, namely HiFiMask . Specifically, a total amount of 54,600 videos are recorded from 75 subjects with 225 realistic masks by 7 new kinds of sensors. Along with the dataset, we propose a novel C ontrastive C ontext-aware L earning (CCL) framework. CCL is a new training methodology for supervised PAD tasks, which is able to learn by leveraging rich contexts accurately (e.g., subjects, mask material and lighting) among pairs of live faces and high-fidelity mask attacks. Extensive experimental evaluations on HiFiMask and three additional 3D mask datasets demonstrate the effectiveness of our method. The codes and dataset will be released soon.  
  Address  
  Corporate Author Thesis  
  Publisher IEEE Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA Approved no  
  Call Number Admin @ si @ LZY2022 Serial 3778  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: