|
Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes Garcıa-Martinez, Fethi Bougares, et al. (2016). Does Multimodality Help Human and Machine for Translation and Image Captioning? In 1st conference on machine translation.
Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate theusefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
|
|
|
Jose Ramirez Moreno, Juan R Revilla, Miguel Reyes, & Sergio Escalera. (2016). Validación del Software ADIBAS asociado al sensor Kinect de Microsoft para la evaluación de la posición corporal. In 4th Congreso WCPT-SAR.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2016). Support Vector Machines with Time Series Distance Kernels for Action Classification. In IEEE Winter Conference on Applications of Computer Vision (pp. 1–7).
Abstract: Despite the outperformance of Support Vector Machine (SVM) on many practical classification problems, the algorithm is not directly applicable to multi-dimensional trajectories having different lengths. In this paper, a new class of SVM that is applicable to trajectory classification, such as action recognition, is developed by incorporating two efficient time-series distances measures into the kernel function.
Dynamic Time Warping and Longest Common Subsequence distance measures along with their derivatives are
employed as the SVM kernel. In addition, the pairwise proximity learning strategy is utilized in order to make use of non-positive semi-definite kernels in the SVM formulation. The proposed method is employed for a challenging classification problem: action recognition by depth cameras using only skeleton data; and evaluated on three benchmark action datasets. Experimental results demonstrate the outperformance of our methodology compared to the state-ofthe-art on the considered datasets.
|
|
|
C. Alejandro Parraga, & Arash Akbarinia. (2016). Colour Constancy as a Product of Dynamic Centre-Surround Adaptation. In 16th Annual meeting in Vision Sciences Society (Vol. 16).
Abstract: Colour constancy refers to the human visual system's ability to preserve the perceived colour of objects despite changes in the illumination. Its exact mechanisms are unknown, although a number of systems ranging from retinal to cortical and memory are thought to play important roles. The strength of the perceptual shift necessary to preserve these colours is usually estimated by the vectorial distances from an ideal match (or canonical illuminant). In this work we explore how much of the colour constancy phenomenon could be explained by well-known physiological properties of V1 and V2 neurons whose receptive fields (RF) vary according to the contrast and orientation of surround stimuli. Indeed, it has been shown that both RF size and the normalization occurring between centre and surround in cortical neurons depend on the local properties of surrounding stimuli. Our stating point is the construction of a computational model which includes this dynamical centre-surround adaptation by means of two overlapping asymmetric Gaussian kernels whose variances are adjusted to the contrast of surrounding pixels to represent the changes in RF size of cortical neurons and the weights of their respective contributions are altered according to differences in centre-surround contrast and orientation. The final output of the model is obtained after convolving an image with this dynamical operator and an estimation of the illuminant is obtained by considering the contrast of the far surround. We tested our algorithm on naturalistic stimuli from several benchmark datasets. Our results show that although our model does not require any training, its performance against the state-of-the-art is highly competitive, even outperforming learning-based algorithms in some cases. Indeed, these results are very encouraging if we consider that they were obtained with the same parameters for all datasets (i.e. just like the human visual system operates).
|
|
|
Antoni Gurgui, Debora Gil, Enric Marti, & Vicente Grau. (2016). Left-Ventricle Basal Region Constrained Parametric Mapping to Unitary Domain. In 7th International Workshop on Statistical Atlases & Computational Modelling of the Heart (Vol. 10124, pp. 163–171). LNCS.
Abstract: Due to its complex geometry, the basal ring is often omitted when putting different heart geometries into correspondence. In this paper, we present the first results on a new mapping of the left ventricle basal rings onto a normalized coordinate system using a fold-over free approach to the solution to the Laplacian. To guarantee correspondences between different basal rings, we imposed some internal constrained positions at anatomical landmarks in the normalized coordinate system. To prevent internal fold-overs, constraints are handled by cutting the volume into regions defined by anatomical features and mapping each piece of the volume separately. Initial results presented in this paper indicate that our method is able to handle internal constrains without introducing fold-overs and thus guarantees one-to-one mappings between different basal ring geometries.
Keywords: Laplacian; Constrained maps; Parameterization; Basal ring
|
|
|
Antonio Esteban Lansaque, Carles Sanchez, Agnes Borras, Marta Diez-Ferrer, Antoni Rosell, & Debora Gil. (2016). Stable Airway Center Tracking for Bronchoscopic Navigation. In 28th Conference of the international Society for Medical Innovation and Technology.
Abstract: Bronchoscopists use X‐ray fluoroscopy to guide bronchoscopes to the lesion to be biopsied without any kind of incisions. Reducing exposure to X‐ray is important for both patients and doctors but alternatives like electromagnetic navigation require specific equipment and increase the cost of the clinical procedure. We propose a guiding system based on the extraction of airway centers from intra‐operative videos. Such anatomical landmarks could be
matched to the airway centerline extracted from a pre‐planned CT to indicate the best path to the lesion. We present an extraction of lumen centers
from intra‐operative videos based on tracking of maximal stable regions of energy maps.
|
|
|
Carles Sanchez, Debora Gil, T. Gache, N. Koufos, Marta Diez-Ferrer, & Antoni Rosell. (2016). SENSA: a System for Endoscopic Stenosis Assessment. In 28th Conference of the international Society for Medical Innovation and Technology.
Abstract: Documenting the severity of a static or dynamic Central Airway Obstruction (CAO) is crucial to establish proper diagnosis and treatment, predict possible treatment effects and better follow-up the patients. The subjective visual evaluation of a stenosis during video-bronchoscopy still remains the most common way to assess a CAO in spite of a consensus among experts for a need to standardize all calculations [1].
The Computer Vision Center in cooperation with the «Hospital de Bellvitge», has developed a System for Endoscopic Stenosis Assessment (SENSA), which computes CAO directly by analyzing standard bronchoscopic data without the need of using other imaging tecnologies.
|
|
|
Youssef El Rhabi, Simon Loic, Brun Luc, Josep Llados, & Felipe Lumbreras. (2016). Information Theoretic Rotationwise Robust Binary Descriptor Learning. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 368–378).
Abstract: In this paper, we propose a new data-driven approach for binary descriptor selection. In order to draw a clear analysis of common designs, we present a general information-theoretic selection paradigm. It encompasses several standard binary descriptor construction schemes, including a recent state-of-the-art one named BOLD. We pursue the same endeavor to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure. The effectiveness of our approach is demonstrated on two standard datasets, where our descriptor is compared to BOLD and to several classical descriptors. In particular, it emerges that our approach can reproduce equivalent if not better performance as BOLD while relying on twice shorter descriptors. Such an improvement can be influential for real-time applications.
|
|
|
Sounak Dey, Anguelos Nicolaou, Josep Llados, & Umapada Pal. (2016). Local Binary Pattern for Word Spotting in Handwritten Historical Document. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 574–583). LNCS.
Abstract: Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features, which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally, we compare the results of our proposed retrieval method with other methods in the literature and we obtain the best results in the learning free paradigm.
Keywords: Local binary patterns; Spatial sampling; Learning-free; Word spotting; Handwritten; Historical document analysis; Large-scale data
|
|
|
Juan Ignacio Toledo, Sebastian Sudholt, Alicia Fornes, Jordi Cucurull, A. Fink, & Josep Llados. (2016). Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (Vol. 10029, pp. 543–552). LNCS. Springer International Publishing.
Abstract: The extraction of relevant information from historical document collections is one of the key steps in order to make these documents available for access and searches. The usual approach combines transcription and grammars in order to extract semantically meaningful entities. In this paper, we describe a new method to obtain word categories directly from non-preprocessed handwritten word images. The method can be used to directly extract information, being an alternative to the transcription. Thus it can be used as a first step in any kind of syntactical analysis. The approach is based on Convolutional Neural Networks with a Spatial Pyramid Pooling layer to deal with the different shapes of the input images. We performed the experiments on a historical marriage record dataset, obtaining promising results.
Keywords: Document image analysis; Word image categorization; Convolutional neural networks; Named entity detection
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2016). Spotting Symbol over Graphical Documents Via Sparsity in Visual Vocabulary. In Recent Trends in Image Processing and Pattern Recognition (Vol. 709).
|
|
|
Daniel Hernandez, Alejandro Chacon, Antonio Espinosa, David Vazquez, Juan Carlos Moure, & Antonio Lopez. (2016). Stereo Matching using SGM on the GPU.
Abstract: Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results on the new embedded energy efficient GPU devices. Our design runs on a Tegra X1 at 42 frames per second (fps) for an image size of 640x480, 128 disparity levels, and using 4 path directions for the SGM method.
Keywords: CUDA; Stereo; Autonomous Vehicle
|
|
|
Fernando Vilariño. (2016). Dissemination, creation and education from archives: Case study of the collection of Digitized Visual Poems from Joan Brossa Foundation. In International Workshop on Poetry: Archives, Poetries and Receptions.
|
|
|
G. de Oliveira, A. Cartas, Marc Bolaños, Mariella Dimiccoli, Xavier Giro, & Petia Radeva. (2016). LEMoRe: A Lifelog Engine for Moments Retrieval at the NTCIR-Lifelog LSAT Task. In 12th NTCIR Conference on Evaluation of Information Access Technologies.
Abstract: Semantic image retrieval from large amounts of egocentric visual data requires to leverage powerful techniques for filling in the semantic gap. This paper introduces LEMoRe, a Lifelog Engine for Moments Retrieval, developed in the context of the Lifelog Semantic Access Task (LSAT) of the the NTCIR-12 challenge and discusses its performance variation on different trials. LEMoRe integrates classical image descriptors with high-level semantic concepts extracted by Convolutional Neural Networks (CNN), powered by a graphic user interface that uses natural language processing. Although this is just a first attempt towards interactive image retrieval from large egocentric datasets and there is a large room for improvement of the system components and the user interface, the structure of the system itself and the way the single components cooperate are very promising.
|
|
|
Yaxing Wang, L. Zhang, & Joost Van de Weijer. (2016). Ensembles of generative adversarial networks. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Ensembles are a popular way to improve results of discriminative CNNs. The
combination of several networks trained starting from different initializations
improves results significantly. In this paper we investigate the usage of ensembles of GANs. The specific nature of GANs opens up several new ways to construct ensembles. The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal. As such ensembles of GANs can be constructed based on the same network initialization but just taking models which have different amount of iterations. These so-called self ensembles are much faster to train than traditional ensembles. The second method, called cascade GANs, redirects part of the training data which is badly modeled by the first GAN to another GAN. In experiments on the CIFAR10 dataset we show that ensembles of GANs obtain model probability distributions which better model the data distribution. In addition, we show that these improved results can be obtained at little additional computational cost.
|
|