|
Jose Manuel Alvarez. (2010). Combining Context and Appearance for Road Detection (Antonio Lopez, & Theo Gevers, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Road traffic crashes have become a major cause of death and injury throughout the world.
Hence, in order to improve road safety, the automobile manufacture is moving towards the
development of vehicles with autonomous functionalities such as keeping in the right lane, safe distance keeping between vehicles or regulating the speed of the vehicle according to the traffic conditions. A key component of these systems is vision–based road detection that aims to detect the free road surface ahead the moving vehicle. Detecting the road using a monocular vision system is very challenging since the road is an outdoor scenario imaged from a mobile platform. Hence, the detection algorithm must be able to deal with continuously changing imaging conditions such as the presence ofdifferent objects (vehicles, pedestrians), different environments (urban, highways, off–road), different road types (shape, color), and different imaging conditions (varying illumination, different viewpoints and changing weather conditions). Therefore, in this thesis, we focus on vision–based road detection using a single color camera. More precisely, we first focus on analyzing and grouping pixels according to their low–level properties. In this way, two different approaches are presented to exploit
color and photometric invariance. Then, we focus the research of the thesis on exploiting context information. This information provides relevant knowledge about the road not using pixel features from road regions but semantic information from the analysis of the scene.
In this way, we present two different approaches to infer the geometry of the road ahead
the moving vehicle. Finally, we focus on combining these context and appearance (color)
approaches to improve the overall performance of road detection algorithms. The qualitative and quantitative results presented in this thesis on real–world driving sequences show that the proposed method is robust to varying imaging conditions, road types and scenarios going beyond the state–of–the–art.
|
|
|
Daniel Ponsa. (2007). Model-Based Visual Localisation of Contours and Vehicles (Antonio Lopez, & Xavier Roca, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
|
|
|
Fadi Dornaika, & Angel Sappa. (2008). Real Time Image Registration for Planar Structure and 3D Sensor Pose Estimation. In Asim Bhatti (Ed.), Stereo Vision (Vol. 18, 299–316).
|
|
|
Debora Gil, Agnes Borras, Manuel Ballester, Francesc Carreras, Ruth Aris, Manuel Vazquez, et al. (2011). MIOCARDIA: Integrating cardiac function and muscular architecture for a better diagnosis. In Association for Computing Machinery (Ed.), 14th International Symposium on Applied Sciences in Biomedical and Communication Technologies. Barcelona, Spain.
Abstract: Deep understanding of myocardial structure of the heart would unravel crucial knowledge for clinical and medical procedures. The MIOCARDIA project is a multidisciplinary project in cooperation with l'Hospital de la Santa Creu i de Sant Pau, Clinica la Creu Blanca and Barcelona Supercomputing Center. The ultimate goal of this project is defining a computational model of the myocardium. The model takes into account the deep interrelation between the anatomy and the mechanics of the heart. The paper explains the workflow of the MIOCARDIA project. It also introduces a multiresolution reconstruction technique based on DT-MRI streamlining for simplified global myocardial model generation. Our reconstructions can restore the most complex myocardial structures and provides evidences of a global helical organization.
|
|
|
Jose Elias Yauri. (2023). Deep Learning Based Data Fusion Approaches for the Assessment of Cognitive States on EEG Signals (Aura Hernandez, & Debora Gil, Eds.). Ph.D. thesis, IMPRIMA, .
Abstract: For millennia, the study of the couple brain-mind has fascinated the humanity in order to understand the complex nature of cognitive states. A cognitive state is the state of the mind at a specific time and involves cognition activities to acquire and process information for making a decision, solving a problem, or achieving a goal.
While normal cognitive states assist in the successful accomplishment of tasks; on the contrary, abnormal states of the mind can lead to task failures due to a reduced cognition capability. In this thesis, we focus on the assessment of cognitive states by means of the analysis of ElectroEncephaloGrams (EEG) signals using deep learning methods. EEG records the electrical activity of the brain using a set of electrodes placed on the scalp that output a set of spatiotemporal signals that are expected to be correlated to a specific mental process.
From the point of view of artificial intelligence, any method for the assessment of cognitive states using EEG signals as input should face several challenges. On the one hand, one should determine which is the most suitable approach for the optimal combination of the multiple signals recorded by EEG electrodes. On the other hand, one should have a protocol for the collection of good quality unambiguous annotated data, and an experimental design for the assessment of the generalization and transfer of models. In order to tackle them, first, we propose several convolutional neural architectures to perform data fusion of the signals recorded by EEG electrodes, at raw signal and feature levels. Four channel fusion methods, easy to incorporate into any neural network architecture, are proposed and assessed. Second, we present a method to create an unambiguous dataset for the prediction of cognitive mental workload using serious games and an Airbus-320 flight simulator. Third, we present a validation protocol that takes into account the levels of generalization of models based on the source and amount of test data.
Finally, the approaches for the assessment of cognitive states are applied to two use cases of high social impact: the assessment of mental workload for personalized support systems in the cockpit and the detection of epileptic seizures. The results obtained from the first use case show the feasibility of task transfer of models trained to detect workload in serious games to real flight scenarios. The results from the second use case show the generalization capability of our EEG channel fusion methods at k-fold cross-validation, patient-specific, and population levels.
|
|
|
Hana Jarraya, Muhammad Muzzamil Luqman, & Jean-Yves Ramel. (2017). Improving Fuzzy Multilevel Graph Embedding Technique by Employing Topological Node Features: An Application to Graphics Recognition. In B. Lamiroy, & R Dueire Lins (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 9657). LNCS. Springer.
|
|
|
Alicia Fornes, V.C.Kieu, M. Visani, N.Journet, & Anjan Dutta. (2014). The ICDAR/GREC 2013 Music Scores Competition: Staff Removal. In B.Lamiroy, & J.-M. Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 207–220). LNCS. Springer Berlin Heidelberg.
Abstract: The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated in both staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario concerning old and degraded music scores. For this purpose, we have generated a new set of semi-synthetic images using two degradation models that we previously introduced: local noise and 3D distortions. In this extended paper we provide an extended description of the dataset, degradation models, evaluation metrics, the participant’s methods and the obtained results that could not be presented at ICDAR and GREC proceedings due to page limitations.
Keywords: Competition; Graphics recognition; Music scores; Writer identification; Staff removal
|
|
|
Anjan Dutta, Josep Llados, Horst Bunke, & Umapada Pal. (2014). A Product Graph Based Method for Dual Subgraph Matching Applied to Symbol Spotting. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 7–11). LNCS. Springer Berlin Heidelberg.
Abstract: Product graph has been shown as a way for matching subgraphs. This paper reports the extension of the product graph methodology for subgraph matching applied to symbol spotting in graphical documents. Here we focus on the two major limitations of the previous version of the algorithm: (1) spurious nodes and edges in the graph representation and (2) inefficient node and edge attributes. To deal with noisy information of vectorized graphical documents, we consider a dual edge graph representation on the original graph representing the graphical information and the product graph is computed between the dual edge graphs of the pattern graph and the target graph. The dual edge graph with redundant edges is helpful for efficient and tolerating encoding of the structural information of the graphical documents. The adjacency matrix of the product graph locates the pair of similar edges of two operand graphs and exponentiating the adjacency matrix finds similar random walks of greater lengths. Nodes joining similar random walks between two graphs are found by combining different weighted exponentials of adjacency matrices. An experimental investigation reveals that the recall obtained by this approach is quite encouraging.
Keywords: Product graph; Dual edge graph; Subgraph matching; Random walks; Graph kernel
|
|
|
Klaus Broelemann, Anjan Dutta, Xiaoyi Jiang, & Josep Llados. (2014). Hierarchical Plausibility-Graphs for Symbol Spotting in Graphical Documents. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 25–37). LNCS. Springer Berlin Heidelberg.
Abstract: Graph representation of graphical documents often suffers from noise such as spurious nodes and edges, and their discontinuity. In general these errors occur during the low-level image processing viz. binarization, skeletonization, vectorization etc. Hierarchical graph representation is a nice and efficient way to solve this kind of problem by hierarchically merging node-node and node-edge depending on the distance. But the creation of hierarchical graph representing the graphical information often uses hard thresholds on the distance to create the hierarchical nodes (next state) of the lower nodes (or states) of a graph. As a result, the representation often loses useful information. This paper introduces plausibilities to the nodes of hierarchical graph as a function of distance and proposes a modified algorithm for matching subgraphs of the hierarchical graphs. The plausibility-annotated nodes help to improve the performance of the matching algorithm on two hierarchical structures. To show the potential of this approach, we conduct an experiment with the SESYD dataset.
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas, & Josep Llados. (2014). Spotting Graphical Symbols in Camera-Acquired Documents in Real Time. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 3–10). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we present a system devoted to spot graphical symbols in camera-acquired document images. The system is based on the extraction and further matching of ORB compact local features computed over interest key-points. Then, the FLANN indexing framework based on approximate nearest neighbor search allows to efficiently match local descriptors between the captured scene and the graphical models. Finally, the RANSAC algorithm is used in order to compute the homography between the spotted symbol and its appearance in the document image. The proposed approach is efficient and is able to work in real time.
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas, & Josep Llados. (2014). Classification of Administrative Document Images by Logo Identification. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 49–58). Springer Berlin Heidelberg.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier’s graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
Keywords: Administrative Document Classification; Logo Recognition; Logo Spotting
|
|
|
Pau Riba, Alicia Fornes, & Josep Llados. (2017). Towards the Alignment of Handwritten Music Scores. In Bart Lamiroy, & R Dueire Lins (Eds.), International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges (Vol. 9657, pp. 103–116). LNCS.
Abstract: It is very common to nd dierent versions of the same music work in archives of Opera Theaters. These dierences correspond to modications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study.
This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such dierences. Given the diculties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the sta lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
Keywords: Optical Music Recognition; Handwritten Music Scores; Dynamic Time Warping alignment
|
|
|
Pau Riba, Alicia Fornes, & Josep Llados. (2015). Towards the Alignment of Handwritten Music Scores. In Bart Lamiroy, & Rafael Dueire Lins (Eds.), 11th IAPR International Workshop on Graphics Recognition. LNCS. Springer International Publishing.
Abstract: It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such differences. Given the difficulties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the staff lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
|
|
|
Fadi Dornaika, Bogdan Raducanu, & Alireza Bosaghzadeh. (2015). Facial expression recognition based on multi observations with application to social robotics. In Bruce Flores (Ed.), Emotional and Facial Expressions: Recognition, Developmental Differences and Social Importance (pp. 153–166). Nova Science publishers.
Abstract: Human-robot interaction is a hot topic nowadays in the social robotics
community. One crucial aspect is represented by the affective communication
which comes encoded through the facial expressions. In this chapter, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, viewand texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial
expression.
|
|
|
Arash Akbarinia. (2017). Computational Model of Visual Perception: From Colour to Form (C. Alejandro Parraga, Ed.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: The original idea of this project was to study the role of colour in the challenging task of object recognition. We started by extending previous research on colour naming showing that it is feasible to capture colour terms through parsimonious ellipsoids. Although, the results of our model exceeded state-of-the-art in two benchmark datasets, we realised that the two phenomena of metameric lights and colour constancy must be addressed prior to any further colour processing. Our investigation of metameric pairs reached the conclusion that they are infrequent in real world scenarios. Contrary to that, the illumination of a scene often changes dramatically. We addressed this issue by proposing a colour constancy model inspired by the dynamical centre-surround adaptation of neurons in the visual cortex. This was implemented through two overlapping asymmetric Gaussians whose variances and heights are adjusted according to the local contrast of pixels. We complemented this model with a generic contrast-variant pooling mechanism that inversely connect the percentage of pooled signal to the local contrast of a region. The results of our experiments on four benchmark datasets were indeed promising: the proposed model, although simple, outperformed even learning-based approaches in many cases. Encouraged by the success of our contrast-variant surround modulation, we extended this approach to detect boundaries of objects. We proposed an edge detection model based on the first derivative of the Gaussian kernel. We incorporated four types of surround: full, far, iso- and orthogonal-orientation. Furthermore, we accounted for the pooling mechanism at higher cortical areas and the shape feedback sent to lower areas. Our results in three benchmark datasets showed significant improvement over non-learning algorithms.
To summarise, we demonstrated that biologically-inspired models offer promising solutions to computer vision problems, such as, colour naming, colour constancy and edge detection. We believe that the greatest contribution of this Ph.D dissertation is modelling the concept of dynamic surround modulation that shows the significance of contrast-variant surround integration. The models proposed here are grounded on only a portion of what we know about the human visual system. Therefore, it is only natural to complement them accordingly in future works.
|
|