Carles Sanchez, Debora Gil, Jorge Bernal, F. Javier Sanchez, Marta Diez-Ferrer, & Antoni Rosell. (2016). Navigation Path Retrieval from Videobronchoscopy using Bronchial Branches. In 19th International Conference on Medical Image Computing and Computer Assisted Intervention Workshops (Vol. 9401, pp. 62–70). LNCS.
Abstract: Bronchoscopy biopsy can be used to diagnose lung cancer without risking complications of other interventions like transthoracic needle aspiration. During bronchoscopy, the clinician has to navigate through the bronchial tree to the target lesion. A main drawback is the difficulty to check whether the exploration is following the correct path. The usual guidance using fluoroscopy implies repeated radiation of the clinician, while alternative systems (like electromagnetic navigation) require specific equipment that increases intervention costs. We propose to compute the navigated path using anatomical landmarks extracted from the sole analysis of videobronchoscopy images. Such landmarks allow matching the current exploration to the path previously planned on a CT to indicate clinician whether the planning is being correctly followed or not. We present a feasibility study of our landmark based CT-video matching using bronchoscopic videos simulated on a virtual bronchoscopy interactive interface.
Keywords: Bronchoscopy navigation; Lumen center; Brochial branches; Navigation path; Videobronchoscopy
|
Lluis Gomez. (2016). Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding (Dimosthenis Karatzas, Ed.). Ph.D. thesis, , .
Abstract: This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes.
For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any
language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision.
Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which
text emerges as a perceptually significant group of atomic objects.
In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods.
Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve
the overall tracking performance of the system.
Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key
characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme.
|
Jordi Roca. (2012). Constancy and inconstancy in categorical colour perception (Maria Vanrell, & C. Alejandro Parraga, Eds.). Ph.D. thesis, , .
Abstract: To recognise objects is perhaps the most important task an autonomous system, either biological or artificial needs to perform. In the context of human vision, this is partly achieved by recognizing the colour of surfaces despite changes in the wavelength distribution of the illumination, a property called colour constancy. Correct surface colour recognition may be adequately accomplished by colour category matching without the need to match colours precisely, therefore categorical colour constancy is likely to play an important role for object identification to be successful. The main aim of this work is to study the relationship between colour constancy and categorical colour perception. Previous studies of colour constancy have shown the influence of factors such the spatio-chromatic properties of the background, individual observer's performance, semantics, etc. However there is very little systematic study of these influences. To this end, we developed a new approach to colour constancy which includes both individual observers' categorical perception, the categorical structure of the background, and their interrelations resulting in a more comprehensive characterization of the phenomenon. In our study, we first developed a new method to analyse the categorical structure of 3D colour space, which allowed us to characterize individual categorical colour perception as well as quantify inter-individual variations in terms of shape and centroid location of 3D categorical regions. Second, we developed a new colour constancy paradigm, termed chromatic setting, which allows measuring the precise location of nine categorically-relevant points in colour space under immersive illumination. Additionally, we derived from these measurements a new colour constancy index which takes into account the magnitude and orientation of the chromatic shift, memory effects and the interrelations among colours and a model of colour naming tuned to each observer/adaptation state. Our results lead to the following conclusions: (1) There exists large inter-individual variations in the categorical structure of colour space, and thus colour naming ability varies significantly but this is not well predicted by low-level chromatic discrimination ability; (2) Analysis of the average colour naming space suggested the need for an additional three basic colour terms (turquoise, lilac and lime) for optimal colour communication; (3) Chromatic setting improved the precision of more complex linear colour constancy models and suggested that mechanisms other than cone gain might be best suited to explain colour constancy; (4) The categorical structure of colour space is broadly stable under illuminant changes for categorically balanced backgrounds; (5) Categorical inconstancy exists for categorically unbalanced backgrounds thus indicating that categorical information perceived in the initial stages of adaptation may constrain further categorical perception.
|
Ivet Rafegas, & Maria Vanrell. (2016). Color spaces emerging from deep convolutional networks. In 24th Color and Imaging Conference (pp. 225–230).
Abstract: Award for the best interactive session
Defining color spaces that provide a good encoding of spatio-chromatic properties of color surfaces is an open problem in color science [8, 22]. Related to this, in computer vision the fusion of color with local image features has been studied and evaluated [16]. In human vision research, the cells which are selective to specific color hues along the visual pathway are also a focus of attention [7, 14]. In line with these research aims, in this paper we study how color is encoded in a deep Convolutional Neural Network (CNN) that has been trained on more than one million natural images for object recognition. These convolutional nets achieve impressive performance in computer vision, and rival the representations in human brain. In this paper we explore how color is represented in a CNN architecture that can give some intuition about efficient spatio-chromatic representations. In convolutional layers the activation of a neuron is related to a spatial filter, that combines spatio-chromatic representations. We use an inverted version of it to explore the properties. Using a series of unsupervised methods we classify different type of neurons depending on the color axes they define and we propose an index of color-selectivity of a neuron. We estimate the main color axes that emerge from this trained net and we prove that colorselectivity of neurons decreases from early to deeper layers.
|
Ivet Rafegas, & Maria Vanrell. (2016). Colour Visual Coding in trained Deep Neural Networks. In European Conference on Visual Perception.
|
Marçal Rusiñol, & Josep Llados. (2017). Flowchart Recognition in Patent Information Retrieval. In M. Lupu, K. Mayer, N. Kando, & A.J. Trippe (Eds.), Current Challenges in Patent Information Retrieval (Vol. 37, pp. 351–368). Springer Berlin Heidelberg.
|
Alicia Fornes, Josep Llados, Oriol Ramos Terrades, & Marçal Rusiñol. (2016). La Visió per Computador com a Eina per a la Interpretació Automàtica de Fonts Documentals. Lligall, Revista Catalana d'Arxivística, 20–46.
|
Arash Akbarinia, & C. Alejandro Parraga. (2016). Dynamically Adjusted Surround Contrast Enhances Boundary Detection, European Conference on Visual Perception. In European Conference on Visual Perception.
|
Yaxing Wang, L. Zhang, & Joost Van de Weijer. (2016). Ensembles of generative adversarial networks. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Ensembles are a popular way to improve results of discriminative CNNs. The
combination of several networks trained starting from different initializations
improves results significantly. In this paper we investigate the usage of ensembles of GANs. The specific nature of GANs opens up several new ways to construct ensembles. The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal. As such ensembles of GANs can be constructed based on the same network initialization but just taking models which have different amount of iterations. These so-called self ensembles are much faster to train than traditional ensembles. The second method, called cascade GANs, redirects part of the training data which is badly modeled by the first GAN to another GAN. In experiments on the CIFAR10 dataset we show that ensembles of GANs obtain model probability distributions which better model the data distribution. In addition, we show that these improved results can be obtained at little additional computational cost.
|
Guim Perarnau, Joost Van de Weijer, Bogdan Raducanu, & Jose Manuel Alvarez. (2016). Invertible conditional gans for image editing. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes.
Additionally, we evaluate the design of cGANs. The combination of an encoder
with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real
images with deterministic complex modifications.
|
Joana Maria Pujadas-Mora, Alicia Fornes, Josep Llados, & Anna Cabre. (2016). Bridging the gap between historical demography and computing: tools for computer-assisted transcription and the analysis of demographic sources. In K.Matthijs, S.Hin, H.Matsuo, & J.Kok (Eds.), The future of historical demography. Upside down and inside out (pp. 127–131). Acco Publishers.
|
Oriol Vicente, Alicia Fornes, & Ramon Valdes. (2016). The Digital Humanities Network of the UABCie: a smart structure of research and social transference for the digital humanities. In Digital Humanities Centres: Experiences and Perspectives.
|
Veronica Romero, Alicia Fornes, Enrique Vidal, & Joan Andreu Sanchez. (2016). Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books. In 15th international conference on Frontiers in Handwriting Recognition.
Abstract: Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and
genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous
works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.
|
Xavier Baro, Sergio Escalera, Isabelle Guyon, Julio C. S. Jacques Junior, Lukasz Romaszko, Lisheng Sun, et al. (2016). Coompetitions in machine learning: case studies. In 30th Annual Conference on Neural Information Processing Systems Worshops.
|
Angel Valencia, Roger Idrovo, Angel Sappa, Douglas Plaza, & Daniel Ochoa. (2017). A 3D Vision Based Approach for Optimal Grasp of Vacuum Grippers. In IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics.
Abstract: In general, robot grasping approaches are based on the usage of multi-finger grippers. However, when large size objects need to be manipulated vacuum grippers are preferred, instead of finger based grippers. This paper aims to estimate the best picking place for a two suction cups vacuum gripper,
when planar objects with an unknown size and geometry are considered. The approach is based on the estimation of geometric properties of object’s shape from a partial cloud of points (a single 3D view), in such a way that combine with considerations of a theoretical model to generate an optimal contact point
that minimizes the vacuum force needed to guarantee a grasp.
Experimental results in real scenarios are presented to show the validity of the proposed approach.
|