|
Adria Molina, Pau Riba, Lluis Gomez, Oriol Ramos Terrades, & Josep Llados. (2021). Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach. In 16th International Conference on Document Analysis and Recognition (Vol. 12822, pp. 306–320). LNCS.
Abstract: This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design a neural network that learns a classifier or a regressor, we propose a learning objective based on the nDCG ranking metric. We have experimentally evaluated the performance of the method in two different tasks: date estimation and date-sensitive image retrieval, using the DEW public database, overcoming the baseline methods.
|
|
|
Josep Llados, Daniel Lopresti, & Seiichi Uchida (Eds.). (2021). 16th International Conference, 2021, Proceedings, Part II (Vol. 12822). LNCS. Springer Cham.
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|
|
Debora Gil, Oriol Ramos Terrades, & Raquel Perez. (2021). Topological Radiomics (TOPiomics): Early Detection of Genetic Abnormalities in Cancer Treatment Evolution. In Extended Abstracts GEOMVAP 2019, Trends in Mathematics 15 (Vol. 15, 89–93). Springer Nature.
Abstract: Abnormalities in radiomic measures correlate to genomic alterations prone to alter the outcome of personalized anti-cancer treatments. TOPiomics is a new method for the early detection of variations in tumor imaging phenotype from a topological structure in multi-view radiomic spaces.
|
|
|
Gemma Rotger, Francesc Moreno-Noguer, Felipe Lumbreras, & Antonio Agudo. (2019). Single view facial hair 3D reconstruction. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11867, pp. 423–436). LNCS.
Abstract: n this work, we introduce a novel energy-based framework that addresses the challenging problem of 3D reconstruction of facial hair from a single RGB image. To this end, we identify hair pixels over the image via texture analysis and then determine individual hair fibers that are modeled by means of a parametric hair model based on 3D helixes. We propose to minimize an energy composed of several terms, in order to adapt the hair parameters that better fit the image detections. The final hairs respond to the resulting fibers after a post-processing step where we encourage further realism. The resulting approach generates realistic facial hair fibers from solely an RGB image without assuming any training data nor user interaction. We provide an experimental evaluation on real-world pictures where several facial hair styles and image conditions are observed, showing consistent results and establishing a comparison with respect to competing approaches.
Keywords: 3D Vision; Shape Reconstruction; Facial Hair Modeling
|
|
|
Parichehr Behjati Ardakani, Diego Velazquez, Josep M. Gonfaus, Pau Rodriguez, Xavier Roca, & Jordi Gonzalez. (2019). Catastrophic interference in Disguised Face Recognition. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11868, pp. 64–75). LNCS.
Abstract: It is commonly known the natural tendency of artificial neural networks to completely and abruptly forget previously known information when learning new information. We explore this behaviour in the context of Face Verification on the recently proposed Disguised Faces in the Wild dataset (DFW). We empirically evaluate several commonly used DCNN architectures on Face Recognition and distill some insights about the effect of sequential learning on distinct identities from different datasets, showing that the catastrophic forgetness phenomenon is present even in feature embeddings fine-tuned on different tasks from the original domain.
Keywords: Neural network forgetness; Face recognition; Disguised Faces
|
|
|
Arnau Baro, Pau Riba, Jorge Calvo-Zaragoza, & Alicia Fornes. (2018). Optical Music Recognition by Long Short-Term Memory Networks. In B. L. A. Fornes (Ed.), Graphics Recognition. Current Trends and Evolutions (Vol. 11009, pp. 81–95). LNCS. Springer.
Abstract: Optical Music Recognition refers to the task of transcribing the image of a music score into a machine-readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level. The experimental results are promising, showing the benefits of our approach.
Keywords: Optical Music Recognition; Recurrent Neural Network; Long ShortTerm Memory
|
|
|
Julie Digne, Mariella Dimiccoli, Neus Sabater, & Philippe Salembier. (2015). Neighborhood Filters and the Recovery of 3D Information. In Handbook of Mathematical Methods in Imaging (pp. 1645–1673). Springer New York.
Abstract: Following their success in image processing (see Chapter Local Smoothing Neighborhood Filters), neighborhood filters have been extended to 3D surface processing. This adaptation is not straightforward. It has led to several variants for surfaces depending on whether the surface is defined as a mesh, or as a raw data point set. The image gray level in the bilateral similarity measure is replaced by a geometric information such as the normal or the curvature. The first section of this chapter reviews the variants of 3D mesh bilateral filters and compares them to the simplest possible isotropic filter, the mean curvature motion.In a second part, this chapter reviews applications of the bilateral filter to a data composed of a sparse depth map (or of depth cues) and of the image on which they have been computed. Such sparse depth cues can be obtained by stereovision or by psychophysical techniques. The underlying assumption to these applications is that pixels with similar intensity around a region are likely to have similar depths. Therefore, when diffusing depth information with a bilateral filter based on locality and color similarity, the discontinuities in depth are assured to be consistent with the color discontinuities, which is generally a desirable property. In the reviewed applications, this ends up with the reconstruction of a dense perceptual depth map from the joint data of an image and of depth cues.
|
|
|
David Geronimo, & Antonio Lopez. (2014). Vision-based Pedestrian Protection Systems for Intelligent Vehicles. Springer Briefs in Computer Vision.
Abstract: Pedestrian Protection Systems (PPSs) are on-board systems aimed at detecting and tracking people in the surroundings of a vehicle in order to avoid potentially dangerous situations. These systems, together with other Advanced Driver Assistance Systems (ADAS) such as lane departure warning or adaptive cruise control, are one of the most promising ways to improve traffic safety. By the use of computer vision, cameras working either in the visible or infra-red spectra have been demonstrated as a reliable sensor to perform this task. Nevertheless, the variability of human’s appearance, not only in terms of clothing and sizes but also as a result of their dynamic shape, makes pedestrians one of the most complex classes even for computer vision. Moreover, the unstructured changing and unpredictable environment in which such on-board systems must work makes detection a difficult task to be carried out with the demanded robustness. In this brief, the state of the art in PPSs is introduced through the review of the most relevant papers of the last decade. A common computational architecture is presented as a framework to organize each method according to its main contribution. More than 300 papers are referenced, most of them addressing pedestrian detection and others corresponding to the descriptors (features), pedestrian models, and learning machines used. In addition, an overview of topics such as real-time aspects, systems benchmarking and future challenges of this research area are presented.
Keywords: Computer Vision; Driver Assistance Systems; Intelligent Vehicles; Pedestrian Detection; Vulnerable Road Users
|
|
|
C. Alejandro Parraga. (2014). Color Vision, Computational Methods for. In Dieter Jaeger, & Ranu Jung (Eds.), Encyclopedia of Computational Neuroscience (pp. 1–11). Springer-Verlag Berlin Heidelberg.
Abstract: The study of color vision has been aided by a whole battery of computational methods that attempt to describe the mechanisms that lead to our perception of colors in terms of the information-processing properties of the visual system. Their scope is highly interdisciplinary, linking apparently dissimilar disciplines such as mathematics, physics, computer science, neuroscience, cognitive science, and psychology. Since the sensation of color is a feature of our brains, computational approaches usually include biological features of neural systems in their descriptions, from retinal light-receptor interaction to subcortical color opponency, cortical signal decoding, and color categorization. They produce hypotheses that are usually tested by behavioral or psychophysical experiments.
Keywords: Color computational vision; Computational neuroscience of color
|
|
|
Miquel Ferrer, I. Bardaji, Ernest Valveny, Dimosthenis Karatzas, & Horst Bunke. (2013). Median Graph Computation by Means of Graph Embedding into Vector Spaces. In Yun Fu, & Yungian Ma (Eds.), Graph Embedding for Pattern Analysis (pp. 45–72). Springer New York.
Abstract: In pattern recognition [8, 14], a key issue to be addressed when designing a system is how to represent input patterns. Feature vectors is a common option. That is, a set of numerical features describing relevant properties of the pattern are computed and arranged in a vector form. The main advantages of this kind of representation are computational simplicity and a well sound mathematical foundation. Thus, a large number of operations are available to work with vectors and a large repository of algorithms for pattern analysis and classification exist. However, the simple structure of feature vectors might not be the best option for complex patterns where nonnumerical features or relations between different parts of the pattern become relevant.
|
|
|
Svebor Karaman, Giuseppe Lisanti, Andrew Bagdanov, & Alberto del Bimbo. (2014). From re-identification to identity inference: Labeling consistency by local similarity constraints. In Person Re-Identification (Vol. 2, pp. 287–307). Springer London.
Abstract: In this chapter, we introduce the problem of identity inference as a generalization of person re-identification. It is most appropriate to distinguish identity inference from re-identification in situations where a large number of observations must be identified without knowing a priori that groups of test images represent the same individual. The standard single- and multishot person re-identification common in the literature are special cases of our formulation. We present an approach to solving identity inference by modeling it as a labeling problem in a Conditional Random Field (CRF). The CRF model ensures that the final labeling gives similar labels to detections that are similar in feature space. Experimental results are given on the ETHZ, i-LIDS and CAVIAR datasets. Our approach yields state-of-the-art performance for multishot re-identification, and our results on the more general identity inference problem demonstrate that we are able to infer the identity of very many examples even with very few labeled images in the gallery.
Keywords: re-identification; Identity inference; Conditional random fields; Video surveillance
|
|
|
Sergio Escalera, Xavier Baro, Oriol Pujol, Jordi Vitria, & Petia Radeva. (2011). Traffic-Sign Recognition Systems. Springer London.
|
|
|
Murad Al Haj, Carles Fernandez, Zhanwu Xiong, Ivan Huerta, Jordi Gonzalez, & Xavier Roca. (2011). Beyond the Static Camera: Issues and Trends in Active Vision. In Th.B. Moeslund, A. Hilton, V. Krüger, & L. Sigal (Eds.), Visual Analysis of Humans: Looking at People (pp. 11–30). Springer London.
Abstract: Maximizing both the area coverage and the resolution per target is highly desirable in many applications of computer vision. However, with a limited number of cameras viewing a scene, the two objectives are contradictory. This chapter is dedicated to active vision systems, trying to achieve a trade-off between these two aims and examining the use of high-level reasoning in such scenarios. The chapter starts by introducing different approaches to active cameras configurations. Later, a single active camera system to track a moving object is developed, offering the reader first-hand understanding of the issues involved. Another section discusses practical considerations in building an active vision platform, taking as an example a multi-camera system developed for a European project. The last section of the chapter reflects upon the future trends of using semantic factors to drive smartly coordinated active systems.
|
|
|
Nataliya Shapovalova, Carles Fernandez, Xavier Roca, & Jordi Gonzalez. (2011). Semantics of Human Behavior in Image Sequences. In Albert Ali Salah, & (Ed.), Computer Analysis of Human Behavior (pp. 151–182). Springer London.
Abstract: Human behavior is contextualized and understanding the scene of an action is crucial for giving proper semantics to behavior. In this chapter we present a novel approach for scene understanding. The emphasis of this work is on the particular case of Human Event Understanding. We introduce a new taxonomy to organize the different semantic levels of the Human Event Understanding framework proposed. Such a framework particularly contributes to the scene understanding domain by (i) extracting behavioral patterns from the integrative analysis of spatial, temporal, and contextual evidence and (ii) integrative analysis of bottom-up and top-down approaches in Human Event Understanding. We will explore how the information about interactions between humans and their environment influences the performance of activity recognition, and how this can be extrapolated to the temporal domain in order to extract higher inferences from human events observed in sequences of images.
|
|
|
Alicia Fornes, & Gemma Sanchez. (2014). Analysis and Recognition of Music Scores. In D. Doermann, & K. Tombre (Eds.), Handbook of Document Image Processing and Recognition (Vol. E, pp. 749–774). Springer London.
Abstract: The analysis and recognition of music scores has attracted the interest of researchers for decades. Optical Music Recognition (OMR) is a classical research field of Document Image Analysis and Recognition (DIAR), whose aim is to extract information from music scores. Music scores contain both graphical and textual information, and for this reason, techniques are closely related to graphics recognition and text recognition. Since music scores use a particular diagrammatic notation that follow the rules of music theory, many approaches make use of context information to guide the recognition and solve ambiguities. This chapter overviews the main Optical Music Recognition (OMR) approaches. Firstly, the different methods are grouped according to the OMR stages, namely, staff removal, music symbol recognition, and syntactical analysis. Secondly, specific approaches for old and handwritten music scores are reviewed. Finally, online approaches and commercial systems are also commented.
|
|