|
Patricia Suarez, Angel Sappa, & Boris X. Vintimilla. (2017). Cross-Spectral Image Patch Similarity using Convolutional Neural Network. In IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics.
Abstract: The ability to compare image regions (patches) has been the basis of many approaches to core computer vision problems, including object, texture and scene categorization. Hence, developing representations for image patches have been of interest in several works. The current work focuses on learning similarity between cross-spectral image patches with a 2 channel convolutional neural network (CNN) model. The proposed approach is an adaptation of a previous work, trying to obtain similar results than the state of the art but with a lowcost hardware. Hence, obtained results are compared with both
classical approaches, showing improvements, and a state of the art CNN based approach.
|
|
|
Patricia Suarez, Angel Sappa, & Boris X. Vintimilla. (2018). Cross-spectral image dehaze through a dense stacked conditional GAN based approach. In 14th IEEE International Conference on Signal Image Technology & Internet Based System.
Abstract: This paper proposes a novel approach to remove haze from RGB images using a near infrared images based on a dense stacked conditional Generative Adversarial Network (CGAN). The architecture of the deep network implemented
receives, besides the images with haze, its corresponding image in the near infrared spectrum, which serve to accelerate the learning process of the details of the characteristics of the images. The model uses a triplet layer that allows the independence learning of each channel of the visible spectrum image to remove the haze on each color channel separately. A multiple loss function scheme is proposed, which ensures balanced learning between the colors
and the structure of the images. Experimental results have shown that the proposed method effectively removes the haze from the images. Additionally, the proposed approach is compared with a state of the art approach showing better results.
Keywords: Infrared imaging; Dense; Stacked CGAN; Crossspectral; Convolutional networks
|
|
|
Hugo Prol, Vincent Dumoulin, & Luis Herranz. (2018). Cross-Modulation Networks for Few-Shot Learning.
Abstract: A family of recent successful approaches to few-shot learning relies on learning an embedding space in which predictions are made by computing similarities between examples. This corresponds to combining information between support and query examples at a very late stage of the prediction pipeline. Inspired by this observation, we hypothesize that there may be benefits to combining the information at various levels of abstraction along the pipeline. We present an architecture called Cross-Modulation Networks which allows support and query examples to interact throughout the feature extraction process via a feature-wise modulation mechanism. We adapt the Matching Networks architecture to take advantage of these interactions and show encouraging initial results on miniImageNet in the 5-way, 1-shot setting, where we close the gap with state-of-the-art.
|
|
|
Ajian Liu, Xuan Li, Jun Wan, Yanyan Liang, Sergio Escalera, Hugo Jair Escalante, et al. (2020). Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review. BIO - IET Biometrics, 10(1), 24–43.
Abstract: Face anti-spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has %possessed achieved impressive progress recently due the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing. Recently, a multi-ethnic face anti-spoofing dataset, CASIA-SURF CeFA, has been released with the goal of measuring the ethnic bias. It is the largest up to date cross-ethnicity face anti-spoofing dataset covering 3 ethnicities, 3 modalities, 1,607 subjects, 2D plus 3D attack types, and the first dataset including explicit ethnic labels among the recently released datasets for face anti-spoofing. We organized the Chalearn Face Anti-spoofing Attack Detection Challenge which consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks around this novel resource to boost research aiming to alleviate the ethnic bias. Both tracks have attracted 340 teams in the development stage, and finally 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively. All the results were verified and re-ran by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results. We analyze the top ranked solutions and draw conclusions derived from the competition. In addition we outline future work directions.
|
|
|
Bojana Gajic, & Ramon Baldrich. (2018). Cross-domain fashion image retrieval. In CVPR 2018 Workshop on Women in Computer Vision (WiCV 2018, 4th Edition) (pp. 19500–19502).
Abstract: Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products
in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task.
|
|
|
Mohamed Ilyes Lakhal, Hakan Cevikalp, & Sergio Escalera. (2018). CRN: End-to-end Convolutional Recurrent Network Structure Applied to Vehicle Classification. In 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 137–144).
Abstract: Vehicle type classification is considered to be a central part of Intelligent Traffic Systems. In the recent years, deep learning methods have emerged in as being the state-of-the-art in many computer vision tasks. In this paper, we present a novel yet simple deep learning framework for the vehicle type classification problem. We propose an end-to-end trainable system, that combines convolution neural network for feature extraction and recurrent neural network as a classifier. The recurrent network structure is used to handle various types of feature inputs, and at the same time allows to produce a single or a set of class predictions. In order to assess the effectiveness of our solution, we have conducted a set of experiments in two public datasets, obtaining state of the art results. In addition, we also report results on the newly released MIO-TCD dataset.
Keywords: Vehicle Classification; Deep Learning; End-to-end Learning
|
|
|
David Lloret, Antonio Lopez, Joan Serrat, & Juan J. Villanueva. (1999). Creaseness-based computer tomography and magnetic resonance registration: Comparison with the mutual information method..
Abstract: This paper describes a method which uses the skull as a landmark for automatic registration of computer tomography to magnetic resonance (MR) images. First, the skull is extracted from both images using a new creaseness operator. Then, the resulting creaseness images are used to build a hierarchic structure which permits a robust and fast search. We have justified experimentally the performance of several choices of our algorithm, and we have thoroughly tested its accuracy and robustness against the well-known mutual information method for five different pairs of images. We have found both comparable, and for certain MR images the proposed method achieves better performance.
|
|
|
Antonio Lopez, David Lloret, & Joan Serrat. (1998). Creaseness measures for CT and MR image registration..
Abstract: Creases are a type of ridge/valley structures that can be characterized by local conditions. Therefore, creaseness refers to local ridgeness and valleyness. The curvature K of the level curves and the mean curvature kM of the level surfaces are good measures of creaseness for 2-d and 3-d images, respectively. However, the way they are computed gives rise to discontinuities, reducing their usefulness in many applications. We propose a new creaseness measure, based on these curvatures, that avoids the discontinuities. We demonstrate its usefulness in the registration of CT and MR brain volumes, from the same patient, by searching the maximum in the correlation of their creaseness responses (ridgeness from the CT and valleyness from the MR). Due to the high dimensionality of the space of transforms, the search is performed by a hierarchical approach combined with an optimization method at each level of the hierarchy
|
|
|
Antonio Lopez, Felipe Lumbreras, & Joan Serrat. (1998). Creaseness form level set extrinsec curvature..
|
|
|
A.F. Sole, Antonio Lopez, Cristina Cañero, Petia Radeva, & J. Saludes. (1999). Crease enhancement diffusion.
|
|
|
A.F. Sole, Antonio Lopez, & G. Sapiro. (2001). Crease Enhancement Diffusion. Computer Vision and Image Understanding, 84(2): 241–248 (IF: 1.298), .
|
|
|
Yunan Li, Jun Wan, Qiguang Miao, Sergio Escalera, Huijuan Fang, Huizhou Chen, et al. (2020). CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis. IJCV - International Journal of Computer Vision, 128, 2763–2780.
Abstract: First impressions strongly influence social interactions, having a high impact in the personal and professional life. In this paper, we present a deep Classification-Regression Network (CR-Net) for analyzing the Big Five personality problem and further assisting on job interview recommendation in a first impressions setup. The setup is based on the ChaLearn First Impressions dataset, including multimodal data with video, audio, and text converted from the corresponding audio data, where each person is talking in front of a camera. In order to give a comprehensive prediction, we analyze the videos from both the entire scene (including the person’s motions and background) and the face of the person. Our CR-Net first performs personality trait classification and applies a regression later, which can obtain accurate predictions for both personality traits and interview recommendation. Furthermore, we present a new loss function called Bell Loss to address inaccurate predictions caused by the regression-to-the-mean problem. Extensive experiments on the First Impressions dataset show the effectiveness of our proposed network, outperforming the state-of-the-art.
|
|
|
J.M. Sanchez, X. Binefa, & J.R. Kender. (2002). Coupled Markox Chains for Video Contents Characterization..
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez, & Antonio Lopez. (2014). Cost-sensitive Structured SVM for Multi-category Domain Adaptation. In 22nd International Conference on Pattern Recognition (pp. 3886–3891). IEEE.
Abstract: Domain adaptation addresses the problem of accuracy drop that a classifier may suffer when the training data (source domain) and the testing data (target domain) are drawn from different distributions. In this work, we focus on domain adaptation for structured SVM (SSVM). We propose a cost-sensitive domain adaptation method for SSVM, namely COSS-SSVM. In particular, during the re-training of an adapted classifier based on target and source data, the idea that we explore consists in introducing a non-zero cost even for correctly classified source domain samples. Eventually, we aim to learn a more targetoriented classifier by not rewarding (zero loss) properly classified source-domain training samples. We assess the effectiveness of COSS-SSVM on multi-category object recognition.
Keywords: Domain Adaptation; Pedestrian Detection
|
|
|
Angel Sappa, & Boris X. Vintimilla. (2007). Cost-Based Closed Contour Representations. Journal of Electronic Imaging, 16(2), 023009 (9 pages).
|
|