|
Patricia Suarez, Angel Sappa, Boris X. Vintimilla, & Riad I. Hammoud. (2018). Near InfraRed Imagery Colorization. In 25th International Conference on Image Processing (pp. 2237–2241).
Abstract: This paper proposes a stacked conditional Generative Adversarial Network-based method for Near InfraRed (NIR) imagery colorization. We propose a variant architecture of Generative Adversarial Network (GAN) that uses multiple
loss functions over a conditional probabilistic generative model. We show that this new architecture/loss-function yields better generalization and representation of the generated colored IR images. The proposed approach is evaluated on a large test dataset and compared to recent state of the art methods using standard metrics.
Keywords: Convolutional Neural Networks (CNN), Generative Adversarial Network (GAN), Infrared Imagery colorization
|
|
|
Marco Buzzelli, Joost Van de Weijer, & Raimondo Schettini. (2018). Learning Illuminant Estimation from Object Recognition. In 25th International Conference on Image Processing (pp. 3234–3238).
Abstract: In this paper we present a deep learning method to estimate the illuminant of an image. Our model is not trained with illuminant annotations, but with the objective of improving performance on an auxiliary task such as object recognition. To the best of our knowledge, this is the first example of a deep
learning architecture for illuminant estimation that is trained without ground truth illuminants. We evaluate our solution on standard datasets for color constancy, and compare it with state of the art methods. Our proposal is shown to outperform most deep learning methods in a cross-dataset evaluation
setup, and to present competitive results in a comparison with parametric solutions.
Keywords: Illuminant estimation; computational color constancy; semi-supervised learning; deep learning; convolutional neural networks
|
|
|
Jialuo Chen, Pau Riba, Alicia Fornes, Juan Mas, Josep Llados, & Joana Maria Pujadas-Mora. (2018). Word-Hunter: A Gamesourcing Experience to Validate the Transcription of Historical Manuscripts. In 16th International Conference on Frontiers in Handwriting Recognition (pp. 528–533).
Abstract: Nowadays, there are still many handwritten historical documents in archives waiting to be transcribed and indexed. Since manual transcription is tedious and time consuming, the automatic transcription seems the path to follow. However, the performance of current handwriting recognition techniques is not perfect, so a manual validation is mandatory. Crowdsourcing is a good strategy for manual validation, however it is a tedious task. In this paper we analyze experiences based in gamification
in order to propose and design a gamesourcing framework that increases the interest of users. Then, we describe and analyze our experience when validating the automatic transcription using the gamesourcing application. Moreover, thanks to the combination of clustering and handwriting recognition techniques, we can speed up the validation while maintaining the performance.
Keywords: Crowdsourcing; Gamification; Handwritten documents; Performance evaluation
|
|
|
Rain Eric Haamer, Kaustubh Kulkarni, Nasrin Imanpour, Mohammad Ahsanul Haque, Egils Avots, Michelle Breisch, et al. (2018). Changes in Facial Expression as Biometric: A Database and Benchmarks of Identification. In 8th International Workshop on Human Behavior Understanding.
Abstract: Facial dynamics can be considered as unique signatures for discrimination between people. These have started to become important topic since many devices have the possibility of unlocking using face recognition or verification. In this work, we evaluate the efficacy of the transition frames of video in emotion as compared to the peak emotion frames for identification. For experiments with transition frames we extract features from each frame of the video from a fine-tuned VGG-Face Convolutional Neural Network (CNN) and geometric features from facial landmark points. To model the temporal context of the transition frames we train a Long-Short Term Memory (LSTM) on the geometric and the CNN features. Furthermore, we employ two fusion strategies: first, an early fusion, in which the geometric and the CNN features are stacked and fed to the LSTM. Second, a late fusion, in which the prediction of the LSTMs, trained independently on the two features, are stacked and used with a Support Vector Machine (SVM). Experimental results show that the late fusion strategy gives the best results and the transition frames give better identification results as compared to the peak emotion frames.
|
|
|
Mohammad A. Haque, Ruben B. Bautista, Kamal Nasrollahi, Sergio Escalera, Christian B. Laursen, Ramin Irani, et al. (2018). Deep Multimodal Pain Recognition: A Database and Comparision of Spatio-Temporal Visual Modalities, Faces and Gestures. In 13th IEEE Conference on Automatic Face and Gesture Recognition (pp. 250–257).
Abstract: Pain is a symptom of many disorders associated with actual or potential tissue damage in human body. Managing pain is not only a duty but also highly cost prone. The most primitive state of pain management is the assessment of pain. Traditionally it was accomplished by self-report or visual inspection by experts. However, automatic pain assessment systems from facial videos are also rapidly evolving due to the need of managing pain in a robust and cost effective way. Among different challenges of automatic pain assessment from facial video data two issues are increasingly prevalent: first, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data on shallow learning scenarios. However, employing deep learning techniques for spatio-temporal analysis considering Depth (D) and Thermal (T) along with RGB has high potential in this area. In this paper, we present the first state-of-the-art publicly available database, 'Multimodal Intensity Pain (MIntPAIN)' database, for RGBDT pain level recognition in sequences. We provide a first baseline results including 5 pain levels recognition by analyzing independent visual modalities and their fusion with CNN and LSTM models. From the experimental evaluation we observe that fusion of modalities helps to enhance recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate.
|
|
|
V. Poulain d'Andecy, Emmanuel Hartmann, & Marçal Rusiñol. (2018). Field Extraction by hybrid incremental and a-priori structural templates. In 13th IAPR International Workshop on Document Analysis Systems (pp. 251–256).
Abstract: In this paper, we present an incremental framework for extracting information fields from administrative documents. First, we demonstrate some limits of the existing state-of-the-art methods such as the delay of the system efficiency. This is a concern in industrial context when we have only few samples of each document class. Based on this analysis, we propose a hybrid system combining incremental learning by means of itf-df statistics and a-priori generic
models. We report in the experimental section our results obtained with a dataset of real invoices.
Keywords: Layout Analysis; information extraction; incremental learning
|
|
|
David Aldavert, & Marçal Rusiñol. (2018). Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting. In 13th IAPR International Workshop on Document Analysis Systems (pp. 223–228).
Abstract: Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for
large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, in
this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic
information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
Keywords: Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information
|
|
|
David Aldavert, & Marçal Rusiñol. (2018). Manuscript text line detection and segmentation using second-order derivatives analysis. In 13th IAPR International Workshop on Document Analysis Systems (pp. 293–298).
Abstract: In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a
bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
Keywords: text line detection; text line segmentation; text region detection; second-order derivatives
|
|
|
Albert Clapes, Ozan Bilici, Dariia Temirova, Egils Avots, Gholamreza Anbarjafari, & Sergio Escalera. (2018). From apparent to real age: gender, age, ethnic, makeup, and expression bias analysis in real age estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 2373–2382).
|
|
|
Dena Bazazian, Dimosthenis Karatzas, & Andrew Bagdanov. (2018). Word Spotting in Scene Images based on Character Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 1872–1874).
Abstract: In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.
|
|
|
Bojana Gajic, & Ramon Baldrich. (2018). Cross-domain fashion image retrieval. In CVPR 2018 Workshop on Women in Computer Vision (WiCV 2018, 4th Edition) (pp. 19500–19502).
Abstract: Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products
in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task.
|
|
|
Ilke Demir, Dena Bazazian, Adriana Romero, Viktoriia Sharmanska, & Lyne P. Tchapmi. (2018). WiCV 2018: The Fourth Women In Computer Vision Workshop. In 4th Women in Computer Vision Workshop (pp. 1941–19412).
Abstract: We present WiCV 2018 – Women in Computer Vision Workshop to increase the visibility and inclusion of women researchers in computer vision field, organized in conjunction with CVPR 2018. Computer vision and machine learning have made incredible progress over the past years, yet the number of female researchers is still low both in academia and industry. WiCV is organized to raise visibility of female researchers, to increase the collaboration,
and to provide mentorship and give opportunities to femaleidentifying junior researchers in the field. In its fourth year, we are proud to present the changes and improvements over the past years, summary of statistics for presenters and attendees, followed by expectations from future generations.
Keywords: Conferences; Computer vision; Industries; Object recognition; Engineering profession; Collaboration; Machine learning
|
|
|
Patricia Suarez, Angel Sappa, Boris X. Vintimilla, & Riad I. Hammoud. (2018). Deep Learning based Single Image Dehazing. In 31st IEEE Conference on Computer Vision and Pattern Recognition Workhsop (pp. 1250–12507).
Abstract: This paper proposes a novel approach to remove haze degradations in RGB images using a stacked conditional Generative Adversarial Network (GAN). It employs a triplet of GAN to remove the haze on each color channel independently.
A multiple loss functions scheme, applied over a conditional probabilistic model, is proposed. The proposed GAN architecture learns to remove the haze, using as conditioned entrance, the images with haze from which the clear
images will be obtained. Such formulation ensures a fast model training convergence and a homogeneous model generalization. Experiments showed that the proposed method generates high-quality clear images.
Keywords: Gallium nitride; Atmospheric modeling; Generators; Generative adversarial networks; Convergence; Image color analysis
|
|
|
Adrian Galdran, Aitor Alvarez-Gila, Alessandro Bria, Javier Vazquez, & Marcelo Bertalmio. (2018). On the Duality Between Retinex and Image Dehazing. In 31st IEEE Conference on Computer Vision and Pattern Recognition (8212–8221).
Abstract: Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement and other related tasks. While these two problems are apparently unrelated, the goal of this work is to show that they can be connected by a simple linear relationship. Specifically, most Retinex-based algorithms have the characteristic feature of always increasing image brightness, which turns them into ideal candidates for effective image dehazing by directly applying Retinex to a hazy image whose intensities have been inverted. In this paper, we give theoretical proof that Retinex on inverted intensities is a solution to the image dehazing problem. Comprehensive qualitative and quantitative results indicate that several classical and modern implementations of Retinex can be transformed into competing image dehazing algorithms performing on pair with more complex fog removal methods, and can overcome some of the main challenges associated with this problem.
Keywords: Image color analysis; Task analysis; Atmospheric modeling; Computer vision; Computational modeling; Lighting
|
|
|
Xialei Liu, Joost Van de Weijer, & Andrew Bagdanov. (2018). Leveraging Unlabeled Data for Crowd Counting by Learning to Rank. In 31st IEEE Conference on Computer Vision and Pattern Recognition (pp. 7661–7669).
Abstract: We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of
cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing
datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and queryby-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-ofthe-art results.
Keywords: Task analysis; Training; Computer vision; Visualization; Estimation; Head; Context modeling
|
|