Jun Wan, Sergio Escalera, Gholamreza Anbarjafari, Hugo Jair Escalante, Xavier Baro, Isabelle Guyon, et al. (2017). Results and Analysis of ChaLearn LAP Multi-modal Isolated and ContinuousGesture Recognition, and Real versus Fake Expressed Emotions Challenges. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
Abstract: We analyze the results of the 2017 ChaLearn Looking at People Challenge at ICCV. The challenge comprised three tracks: (1) large-scale isolated (2) continuous gesture recognition, and (3) real versus fake expressed emotions tracks. It is the second round for both gesture recognition challenges, which were held first in the context of the ICPR 2016 workshop on “multimedia challenges beyond visual analysis”. In this second round, more participants joined the competitions, and the performances considerably improved compared to the first round. Particularly, the best recognition accuracy of isolated gesture recognition has improved from 56.90% to 67.71% in the IsoGD test set, and Mean Jaccard Index (MJI) of continuous gesture recognition has improved from 0.2869 to 0.6103 in the ConGD test set. The third track is the first challenge on real versus fake expressed emotion classification, including six emotion categories, for which a novel database was introduced. The first place was shared between two teams who achieved 67.70% averaged recognition rate on the test set. The data of the three tracks, the participants' code and method descriptions are publicly available to allow researchers to keep making progress in the field.
|
Yagmur Gucluturk, Umut Guclu, Marc Perez, Hugo Jair Escalante, Xavier Baro, Isabelle Guyon, et al. (2017). Visualizing Apparent Personality Analysis with Deep Residual Networks. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV (pp. 3101–3109).
Abstract: Automatic prediction of personality traits is a subjective task that has recently received much attention. Specifically, automatic apparent personality trait prediction from multimodal data has emerged as a hot topic within the filed of computer vision and, more particularly, the so called “looking
at people” sub-field. Considering “apparent” personality traits as opposed to real ones considerably reduces the subjectivity of the task. The real world applications are encountered in a wide range of domains, including entertainment, health, human computer interaction, recruitment and security. Predictive models of personality traits are useful for individuals in many scenarios (e.g., preparing for job interviews, preparing for public speaking). However, these predictions in and of themselves might be deemed to be untrustworthy without human understandable supportive evidence. Through a series of experiments on a recently released benchmark dataset for automatic apparent personality trait prediction, this paper characterizes the audio and
visual information that is used by a state-of-the-art model while making its predictions, so as to provide such supportive evidence by explaining predictions made. Additionally, the paper describes a new web application, which gives feedback on apparent personality traits of its users by combining
model predictions with their explanations.
|
Maryam Asadi-Aghbolaghi, Hugo Bertiche, Vicent Roig, Shohreh Kasaei, & Sergio Escalera. (2017). Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-temporal Handcrafted Features and Deep Strategies. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
|
Albert Clapes, Tinne Tuytelaars, & Sergio Escalera. (2017). Darwintrees for action recognition. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
|
Raul Gomez, Baoguang Shi, Lluis Gomez, Lukas Numann, Andreas Veit, Jiri Matas, et al. (2017). ICDAR2017 Robust Reading Challenge on COCO-Text. In 14th International Conference on Document Analysis and Recognition.
|
Masakazu Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, Lluis Gomez, & Dimosthenis Karatzas. (2017). ICDAR2017 Robust Reading Challenge on Omnidirectional Video. In 14th International Conference on Document Analysis and Recognition.
Abstract: Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still image mode. We intend to continue the competition in the open mode. Expecting further submissions, in this report we provide baseline results in all the tasks in addition to the submissions from the community.
|
Laura Lopez-Fuentes, Claudio Rossi, & Harald Skinnemoen. (2017). River segmentation for flood monitoring. In Data Science for Emergency Management at Big Data 2017.
Abstract: Floods are major natural disasters which cause deaths and material damages every year. Monitoring these events is crucial in order to reduce both the affected people and the economic losses. In this work we train and test three different Deep Learning segmentation algorithms to estimate the water area from river images, and compare their performances. We discuss the implementation of a novel data chain aimed to monitor river water levels by automatically process data collected from surveillance cameras, and to give alerts in case of high increases of the water level or flooding. We also create and openly publish the first image dataset for river water segmentation.
|
Suman Ghosh, & Ernest Valveny. (2017). R-PHOC: Segmentation-Free Word Spotting using CNN. In 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1707.01294
This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our network takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.
Keywords: Convolutional neural network; Image segmentation; Artificial neural network; Nearest neighbor search
|
Suman Ghosh, & Ernest Valveny. (2017). Visual attention models for scene text recognition. In 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1706.01487
In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.
|
Konstantia Georgouli, Katerine Diaz, Jesus Martinez del Rincon, & Anastasios Koidis. (2017). Building generic, easily-updatable chemometric models with harmonisation and augmentation features: The case of FTIR vegetable oils classification. In 3rd Ιnternational Conference Metrology Promoting Standardization and Harmonization in Food and Nutrition.
|
Dena Bazazian, Dimosthenis Karatzas, & Andrew Bagdanov. (2018). Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images. In International Workshop on Egocentric Perception, Interaction and Computing at ECCV.
Abstract: Word spotting in natural scene images has many applications in scene understanding and visual assistance. We propose Soft-PHOC, an intermediate representation of images based on character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We show how to use our descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.
|
Jorge Bernal, Aymeric Histace, Marc Masana, Quentin Angermann, Cristina Sanchez Montes, Cristina Rodriguez de Miguel, et al. (2018). Polyp Detection Benchmark in Colonoscopy Videos using GTCreator: A Novel Fully Configurable Tool for Easy and Fast Annotation of Image Databases. In 32nd International Congress and Exhibition on Computer Assisted Radiology & Surgery.
|
Albert Berenguel, Oriol Ramos Terrades, Josep Llados, & Cristina Cañero. (2017). Evaluation of Texture Descriptors for Validation of Counterfeit Documents. In 14th International Conference on Document Analysis and Recognition (pp. 1237–1242).
Abstract: This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.
|
Mohamed Ilyes Lakhal, Hakan Cevikalp, & Sergio Escalera. (2018). CRN: End-to-end Convolutional Recurrent Network Structure Applied to Vehicle Classification. In 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 137–144).
Abstract: Vehicle type classification is considered to be a central part of Intelligent Traffic Systems. In the recent years, deep learning methods have emerged in as being the state-of-the-art in many computer vision tasks. In this paper, we present a novel yet simple deep learning framework for the vehicle type classification problem. We propose an end-to-end trainable system, that combines convolution neural network for feature extraction and recurrent neural network as a classifier. The recurrent network structure is used to handle various types of feature inputs, and at the same time allows to produce a single or a set of class predictions. In order to assess the effectiveness of our solution, we have conducted a set of experiments in two public datasets, obtaining state of the art results. In addition, we also report results on the newly released MIO-TCD dataset.
Keywords: Vehicle Classification; Deep Learning; End-to-end Learning
|
ChunYang, Xu Cheng Yin, Hong Yu, Dimosthenis Karatzas, & Yu Cao. (2017). ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT). In 14th International Conference on Document Analysis and Recognition (pp. 1444–1447).
Abstract: Hundreds of millions of figures are available in the biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information and understanding biomedical documents. Unlike images in the open domain, biomedical figures present a variety of unique challenges. For example, biomedical figures typically have complex layouts, small font sizes, short text, specific text, complex symbols and irregular text arrangements. This paper presents the final results of the ICDAR 2017 Competition on Text Extraction from Biomedical Literature Figures (ICDAR2017 DeTEXT Competition), which aims at extracting (detecting and recognizing) text from biomedical literature figures. Similar to text extraction from scene images and web pictures, ICDAR2017 DeTEXT Competition includes three major tasks, i.e., text detection, cropped word recognition and end-to-end text recognition. Here, we describe in detail the data set, tasks, evaluation protocols and participants of this competition, and report the performance of the participating methods.
|