Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes Garcıa-Martinez, Fethi Bougares, et al. (2016). Does Multimodality Help Human and Machine for Translation and Image Captioning? In 1st conference on machine translation.
Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate theusefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
|
Katerine Diaz, Aura Hernandez-Sabate, & Antonio Lopez. (2016). A reduced feature set for driver head pose estimation. ASOC - Applied Soft Computing, 45, 98–107.
Abstract: Evaluation of driving performance is of utmost importance in order to reduce road accident rate. Since driving ability includes visual-spatial and operational attention, among others, head pose estimation of the driver is a crucial indicator of driving performance. This paper proposes a new automatic method for coarse and fine head's yaw angle estimation of the driver. We rely on a set of geometric features computed from just three representative facial keypoints, namely the center of the eyes and the nose tip. With these geometric features, our method combines two manifold embedding methods and a linear regression one. In addition, the method has a confidence mechanism to decide if the classification of a sample is not reliable. The approach has been tested using the CMU-PIE dataset and our own driver dataset. Despite the very few facial keypoints required, the results are comparable to the state-of-the-art techniques. The low computational cost of the method and its robustness makes feasible to integrate it in massive consume devices as a real time application.
Keywords: Head pose estimation; driving performance evaluation; subspace based methods; linear regression
|
Egils Avots, M. Daneshmanda, Andres Traumann, Sergio Escalera, & G. Anbarjafaria. (2016). Automatic garment retexturing based on infrared information. CG - Computers & Graphics, 59, 28–38.
Abstract: This paper introduces a new automatic technique for garment retexturing using a single static image along with the depth and infrared information obtained using the Microsoft Kinect II as the RGB-D acquisition device. First, the garment is segmented out from the image using either the Breadth-First Search algorithm or the semi-automatic procedure provided by the GrabCut method. Then texture domain coordinates are computed for each pixel belonging to the garment using normalised 3D information. Afterwards, shading is applied to the new colours from the texture image. As the main contribution of the proposed method, the latter information is obtained based on extracting a linear map transforming the colour present on the infrared image to that of the RGB colour channels. One of the most important impacts of this strategy is that the resulting retexturing algorithm is colour-, pattern- and lighting-invariant. The experimental results show that it can be used to produce realistic representations, which is substantiated through implementing it under various experimentation scenarios, involving varying lighting intensities and directions. Successful results are accomplished also on video sequences, as well as on images of subjects taking different poses. Based on the Mean Opinion Score analysis conducted on many randomly chosen users, it has been shown to produce more realistic-looking results compared to the existing state-of-the-art methods suggested in the literature. From a wide perspective, the proposed method can be used for retexturing all sorts of segmented surfaces, although the focus of this study is on garment retexturing, and the investigation of the configurations is steered accordingly, since the experiments target an application in the context of virtual fitting rooms.
Keywords: Garment Retexturing; Texture Mapping; Infrared Images; RGB-D Acquisition Devices; Shading
|
Marc Masana, Joost Van de Weijer, & Andrew Bagdanov. (2016). On-the-fly Network pruning for object detection. In International conference on learning representations.
Abstract: Object detection with deep neural networks is often performed by passing a few
thousand candidate bounding boxes through a deep neural network for each image.
These bounding boxes are highly correlated since they originate from the same
image. In this paper we investigate how to exploit feature occurrence at the image scale to prune the neural network which is subsequently applied to all bounding boxes. We show that removing units which have near-zero activation in the image allows us to significantly reduce the number of parameters in the network. Results on the PASCAL 2007 Object Detection Challenge demonstrate that up to 40% of units in some fully-connected layers can be entirely eliminated with little change in the detection result.
|
Q. Bao, Marçal Rusiñol, M.Coustaty, Muhammad Muzzamil Luqman, C.D. Tran, & Jean-Marc Ogier. (2016). Delaunay triangulation-based features for Camera-based document image retrieval system. In 12th IAPR Workshop on Document Analysis Systems (pp. 1–6).
Abstract: In this paper, we propose a new feature vector, named DElaunay TRIangulation-based Features (DETRIF), for real-time camera-based document image retrieval. DETRIF is computed based on the geometrical constraints from each pair of adjacency triangles in delaunay triangulation which is constructed from centroids of connected components. Besides, we employ a hashing-based indexing system in order to evaluate the performance of DETRIF and to compare it with other systems such as LLAH and SRIF. The experimentation is carried out on two datasets comprising of 400 heterogeneous-content complex linguistic map images (huge size, 9800 X 11768 pixels resolution)and 700 textual document images.
Keywords: Camera-based Document Image Retrieval; Delaunay Triangulation; Feature descriptors; Indexing
|
Dimosthenis Karatzas, V. Poulain d'Andecy, & Marçal Rusiñol. (2016). Human-Document Interaction – a new frontier for document image analysis. In 12th IAPR Workshop on Document Analysis Systems (pp. 369–374).
Abstract: All indications show that paper documents will not cede in favour of their digital counterparts, but will instead be used increasingly in conjunction with digital information. An open challenge is how to seamlessly link the physical with the digital – how to continue taking advantage of the important affordances of paper, without missing out on digital functionality. This paper
presents the authors’ experience with developing systems for Human-Document Interaction based on augmented document interfaces and examines new challenges and opportunities arising for the document image analysis field in this area. The system presented combines state of the art camera-based document
image analysis techniques with a range of complementary tech-nologies to offer fluid Human-Document Interaction. Both fixed and nomadic setups are discussed that have gone through user testing in real-life environments, and use cases are presented that span the spectrum from business to educational application
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2016). Filtrage de descripteurs locaux pour l'amélioration de la détection de documents. In Colloque International Francophone sur l'Écrit et le Document.
Abstract: In this paper we propose an effective method aimed at reducing the amount of local descriptors to be indexed in a document matching framework.In an off-line training stage, the matching between the model document and incoming images is computed retaining the local descriptors from the model that steadily produce good matches. We have evaluated this approach by using the ICDAR2015 SmartDOC dataset containing near 25000 images from documents to be captured by a mobile device. We have tested the performance of this filtering step by using ORB and SIFT local detectors and descriptors. The results show an important gain both in quality of the final matching as well as in time and space requirements.
Keywords: Local descriptors; mobile capture; document matching; keypoint selection
|
Alejandro Gonzalez Alzate, Zhijie Fang, Yainuvis Socarras, Joan Serrat, David Vazquez, Jiaolong Xu, et al. (2016). Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison. SENS - Sensors, 16(6), 820.
Abstract: Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and night time. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images, (b) just infrared images and (c) both of them. In order to obtain results for the last item we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset we have built for this purpose as well as on the publicly available KAIST multispectral dataset.
Keywords: Pedestrian Detection; FIR
|
Anders Hast, & Alicia Fornes. (2016). A Segmentation-free Handwritten Word Spotting Approach by Relaxed Feature Matching. In 12th IAPR Workshop on Document Analysis Systems (pp. 150–155).
Abstract: The automatic recognition of historical handwritten documents is still considered challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.
|
Juan Ignacio Toledo, Alicia Fornes, Jordi Cucurull, & Josep Llados. (2016). Election Tally Sheets Processing System. In 12th IAPR Workshop on Document Analysis Systems (pp. 364–368).
Abstract: In paper based elections, manual tallies at polling station level produce myriads of documents. These documents share a common form-like structure and a reduced vocabulary worldwide. On the other hand, each tally sheet is filled by a different writer and on different countries, different scripts are used. We present a complete document analysis system for electoral tally sheet processing combining state of the art techniques with a new handwriting recognition subprocess based on unsupervised feature discovery with Variational Autoencoders and sequence classification with BLSTM neural networks. The whole system is designed to be script independent and allows a fast and reliable results consolidation process with reduced operational cost.
|
Joan Mas, Alicia Fornes, & Josep Llados. (2016). An Interactive Transcription System of Census Records using Word-Spotting based Information Transfer. In 12th IAPR Workshop on Document Analysis Systems (pp. 54–59).
Abstract: This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.
|
Eugenio Alcala, Laura Sellart, Vicenc Puig, Joseba Quevedo, Jordi Saludes, David Vazquez, et al. (2016). Comparison of two non-linear model-based control strategies for autonomous vehicles. In 24th Mediterranean Conference on Control and Automation (pp. 846–851).
Abstract: This paper presents the comparison of two nonlinear model-based control strategies for autonomous cars. A control oriented model of vehicle based on a bicycle model is used. The two control strategies use a model reference approach. Using this approach, the error dynamics model is developed. Both controllers receive as input the longitudinal, lateral and orientation errors generating as control outputs the steering angle and the velocity of the vehicle. The first control approach is based on a non-linear control law that is designed by means of the Lyapunov direct approach. The second approach is based on a sliding mode-control that defines a set of sliding surfaces over which the error trajectories will converge. The main advantage of the sliding-control technique is the robustness against non-linearities and parametric uncertainties in the model. However, the main drawback of first order sliding mode is the chattering, so it has been implemented a high order sliding mode control. To test and compare the proposed control strategies, different path following scenarios are used in simulation.
Keywords: Autonomous Driving; Control
|
L. Calvet, A. Ferrer, M. Gomes, A. Juan, & David Masip. (2016). Combining Statistical Learning with Metaheuristics for the Multi-Depot Vehicle Routing Problem with Market Segmentation. CIE - Computers & Industrial Engineering, 94, 93–104.
Abstract: In real-life logistics and distribution activities it is usual to face situations in which the distribution of goods has to be made from multiple warehouses or depots to the nal customers. This problem is known as the Multi-Depot Vehicle Routing Problem (MDVRP), and it typically includes two sequential and correlated stages: (a) the assignment map of customers to depots, and (b) the corresponding design of the distribution routes. Most of the existing work in the literature has focused on minimizing distance-based distribution costs while satisfying a number of capacity constraints. However, no attention has been given so far to potential variations in demands due to the tness of the customerdepot mapping in the case of heterogeneous depots. In this paper, we consider this realistic version of the problem in which the depots are heterogeneous in terms of their commercial oer and customers show dierent willingness to consume depending on how well the assigned depot ts their preferences. Thus, we assume that dierent customer-depot assignment maps will lead to dierent customer-expenditure levels. As a consequence, market-segmentation strategiesneed to be considered in order to increase sales and total income while accounting for the distribution costs. To solve this extension of the MDVRP, we propose a hybrid approach that combines statistical learning techniques with a metaheuristic framework. First, a set of predictive models is generated from historical data. These statistical models allow estimating the demand of any customer depending on the assigned depot. Then, the estimated expenditure of each customer is included as part of an enriched objective function as a way to better guide the stochastic local search inside the metaheuristic framework. A set of computational experiments contribute to illustrate our approach and how the extended MDVRP considered here diers in terms of the proposed solutions from the traditional one.
Keywords: Multi-Depot Vehicle Routing Problem; market segmentation applications; hybrid algorithms; statistical learning
|
Pedro Martins, Paulo Carvalho, & Carlo Gatta. (2016). On the completeness of feature-driven maximally stable extremal regions. PRL - Pattern Recognition Letters, 74, 9–16.
Abstract: By definition, local image features provide a compact representation of the image in which most of the image information is preserved. This capability offered by local features has been overlooked, despite being relevant in many application scenarios. In this paper, we analyze and discuss the performance of feature-driven Maximally Stable Extremal Regions (MSER) in terms of the coverage of informative image parts (completeness). This type of features results from an MSER extraction on saliency maps in which features related to objects boundaries or even symmetry axes are highlighted. These maps are intended to be suitable domains for MSER detection, allowing this detector to provide a better coverage of informative image parts. Our experimental results, which were based on a large-scale evaluation, show that feature-driven MSER have relatively high completeness values and provide more complete sets than a traditional MSER detection even when sets of similar cardinality are considered.
Keywords: Local features; Completeness; Maximally Stable Extremal Regions
|
C. Alejandro Parraga, & Arash Akbarinia. (2016). NICE: A Computational Solution to Close the Gap from Colour Perception to Colour Categorization. Plos - PLoS One, 11(3), e0149538.
Abstract: The segmentation of visible electromagnetic radiation into chromatic categories by the human visual system has been extensively studied from a perceptual point of view, resulting in several colour appearance models. However, there is currently a void when it comes to relate these results to the physiological mechanisms that are known to shape the pre-cortical and cortical visual pathway. This work intends to begin to fill this void by proposing a new physiologically plausible model of colour categorization based on Neural Isoresponsive Colour Ellipsoids (NICE) in the cone-contrast space defined by the main directions of the visual signals entering the visual cortex. The model was adjusted to fit psychophysical measures that concentrate on the categorical boundaries and are consistent with the ellipsoidal isoresponse surfaces of visual cortical neurons. By revealing the shape of such categorical colour regions, our measures allow for a more precise and parsimonious description, connecting well-known early visual processing mechanisms to the less understood phenomenon of colour categorization. To test the feasibility of our method we applied it to exemplary images and a popular ground-truth chart obtaining labelling results that are better than those of current state-of-the-art algorithms.
|