Alejandro Cartas, Petia Radeva, & Mariella Dimiccoli. (2020). Activities of Daily Living Monitoring via a Wearable Camera: Toward Real-World Applications. ACCESS - IEEE Access, 8, 77344–77363.
Abstract: Activity recognition from wearable photo-cameras is crucial for lifestyle characterization and health monitoring. However, to enable its wide-spreading use in real-world applications, a high level of generalization needs to be ensured on unseen users. Currently, state-of-the-art methods have been tested only on relatively small datasets consisting of data collected by a few users that are partially seen during training. In this paper, we built a new egocentric dataset acquired by 15 people through a wearable photo-camera and used it to test the generalization capabilities of several state-of-the-art methods for egocentric activity recognition on unseen users and daily image sequences. In addition, we propose several variants to state-of-the-art deep learning architectures, and we show that it is possible to achieve 79.87% accuracy on users unseen during training. Furthermore, to show that the proposed dataset and approach can be useful in real-world applications, where data can be acquired by different wearable cameras and labeled data are scarcely available, we employed a domain adaptation strategy on two egocentric activity recognition benchmark datasets. These experiments show that the model learned with our dataset, can easily be transferred to other domains with a very small amount of labeled data. Taken together, those results show that activity recognition from wearable photo-cameras is mature enough to be tested in real-world applications.
|
Debora Gil, Antonio Esteban Lansaque, Agnes Borras, Esmitt Ramirez, & Carles Sanchez. (2020). Intraoperative Extraction of Airways Anatomy in VideoBronchoscopy. ACCESS - IEEE Access, 8, 159696–159704.
Abstract: A main bottleneck in bronchoscopic biopsy sampling is to efficiently reach the lesion navigating across bronchial levels. Any guidance system should be able to localize the scope position during the intervention with minimal costs and alteration of clinical protocols. With the final goal of an affordable image-based guidance, this work presents a novel strategy to extract and codify the anatomical structure of bronchi, as well as, the scope navigation path from videobronchoscopy. Experiments using interventional data show that our method accurately identifies the bronchial structure. Meanwhile, experiments using simulated data verify that the extracted navigation path matches the 3D route.
|
Giovanni Maria Farinella, Petia Radeva, & Jose Braz. (2020). Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications (Vol. 5).
|
Giovanni Maria Farinella, Petia Radeva, & Jose Braz. (2020). Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications (Vol. 4).
|
Shifeng Zhang, Ajian Liu, Jun Wan, Yanyan Liang, Guogong Guo, Sergio Escalera, et al. (2020). CASIA-SURF: A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing. TTBIS - IEEE Transactions on Biometrics, Behavior, and Identity Science, 182–193.
Abstract: Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of 1,000 subjects with 21,000 videos and each sample has 3 modalities ( i.e. , RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://sites.google.com/qq.com/face-anti-spoofing/welcome/challengecvpr2019?authuser=0
|
Aymen Azaza, Joost Van de Weijer, Ali Douik, Javad Zolfaghari Bengar, & Marc Masana. (2020). Saliency from High-Level Semantic Image Features. SN - SN Computer Science, 1–12.
Abstract: Top-down semantic information is known to play an important role in assigning saliency. Recently, large strides have been made in improving state-of-the-art semantic image understanding in the fields of object detection and semantic segmentation. Therefore, since these methods have now reached a high-level of maturity, evaluation of the impact of high-level image understanding on saliency estimation is now feasible. We propose several saliency features which are computed from object detection and semantic segmentation results. We combine these features with a standard baseline method for saliency detection to evaluate their importance. Experiments demonstrate that the proposed features derived from object detection and semantic segmentation improve saliency estimation significantly. Moreover, they show that our method obtains state-of-the-art results on (FT, ImgSal, and SOD datasets) and obtains competitive results on four other datasets (ECSSD, PASCAL-S, MSRA-B, and HKU-IS).
|
Lei Kang, Marçal Rusiñol, Alicia Fornes, Pau Riba, & Mauricio Villegas. (2020). Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition. In IEEE Winter Conference on Applications of Computer Vision.
Abstract: Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.
|
Raul Gomez, Jaume Gibert, Lluis Gomez, & Dimosthenis Karatzas. (2020). Exploring Hate Speech Detection in Multimodal Publications. In IEEE Winter Conference on Applications of Computer Vision.
Abstract: In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.
|
Edgar Riba, D. Mishkin, Daniel Ponsa, E. Rublee, & G. Bradski. (2020). Kornia: an Open Source Differentiable Computer Vision Library for PyTorch. In IEEE Winter Conference on Applications of Computer Vision.
|
Sergio Escalera, & Ralf Herbrich. (2020). The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations (Sergio Escalera, & Ralf Hebrick, Eds.).
Abstract: This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.
|
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, & Dimosthenis Karatzas. (2020). Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features. In IEEE Winter Conference on Applications of Computer Vision.
Abstract: Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
|
Arnau Baro, Alicia Fornes, & Carles Badal. (2020). Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism. In 17th International Conference on Frontiers in Handwriting Recognition.
Abstract: Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks.
|
Alicia Fornes, Josep Llados, & Joana Maria Pujadas-Mora. (2020). Browsing of the Social Network of the Past: Information Extraction from Population Manuscript Images. In Handwritten Historical Document Analysis, Recognition, and Retrieval – State of the Art and Future Trends. World Scientific.
|
Jialuo Chen, M.A.Souibgui, Alicia Fornes, & Beata Megyesi. (2020). A Web-based Interactive Transcription Tool for Encrypted Manuscripts. In 3rd International Conference on Historical Cryptology (pp. 52–59).
Abstract: Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with
the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available.
|
David Berga, & Xavier Otazu. (2020). Computations of top-down attention by modulating V1 dynamics. In Computational and Mathematical Models in Vision.
|