Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	1126–1140 of 3413 records found matching your query (RSS):

Search & Display Options

Select All Deselect All

[61–70] << 71 72 73 74 75 76 77 78 79 80 >> [81–90]

List View

Citations

Details

	Records
	Author	Y. Patel; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas
	Title	Dynamic Lexicon Generation for Natural Scene Images			Type	Conference Article
	Year	2016	Publication	14th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	395-410
	Keywords	scene text; photo OCR; scene understanding; lexicon generation; topic modeling; CNN
	Abstract	Many scene text understanding methods approach the endtoend recognition problem from a word-spotting perspective and take huge benet from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely discussed. In this paper we propose a method that generates contextualized lexicons for scene images using only visual information. For this, we exploit the correlation between visual and textual information in a dataset consisting of images and textual content associated with them. Using the topic modeling framework to discover a set of latent topics in such a dataset allows us to re-rank a xed dictionary in a way that prioritizes the words that are more likely to appear in a given image. Moreover, we train a CNN that is able to reproduce those word rankings but using only the image raw pixels as input. We demonstrate that the quality of the automatically obtained custom lexicons is superior to a generic frequency-based baseline.
	Address	Amsterdam; The Netherlands; October 2016
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.084			Approved	no
	Call Number	Admin @ si @ PGR2016			Serial	2825
Permanent link to this record



	Author	Sergio Escalera
	Title	Coding and Decoding Design of ECOCs for Multi-class Pattern and Object Recognition A			Type	Book Whole
	Year	2008	Publication	PhD Thesis, Universitat de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Many real problems require multi-class decisions. In the Pattern Recognition field, many techniques have been proposed to deal with the binary problem. However, the extension of many 2-class classifiers to the multi-class case is a hard task. In this sense, Error-Correcting Output Codes (ECOC) demonstrated to be a powerful tool to combine any number of binary classifiers to model multi-class problems. But there are still many open issues about the capabilities of the ECOC framework. In this thesis, the two main stages of an ECOC design are analyzed: the coding and the decoding steps. We present different problem-dependent designs. These designs take advantage of the knowledge of the problem domain to minimize the number of classifiers, obtaining a high classification performance. On the other hand, we analyze the ECOC codification in order to define new decoding rules that take full benefit from the information provided at the coding step. Moreover, as a successful classification requires a rich feature set, new feature detection/extraction techniques are presented and evaluated on the new ECOC designs. The evaluation of the new methodology is performed on different real and synthetic data sets: UCI Machine Learning Repository, handwriting symbols, traffic signs from a Mobile Mapping System, Intravascular Ultrasound images, Caltech Repository data set or Chaga’s disease data set. The results of this thesis show that significant performance improvements are obtained on both traditional coding and decoding ECOC designs when the new coding and decoding rules are taken into account.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Petia Radeva;Oriol Pujol
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; HuPBA			Approved	no
	Call Number	Admin @ si @ Esc2008b			Serial	2217
Permanent link to this record



	Author	Bogdan Raducanu; Fadi Dornaika
	Title	Appearance-based Face Recognition Using A Supervised Manifold Learning Framework			Type	Conference Article
	Year	2012	Publication	IEEE Workshop on the Applications of Computer Vision	Abbreviated Journal
	Volume		Issue		Pages	465-470
	Keywords
	Abstract	Many natural image sets, depicting objects whose appearance is changing due to motion, pose or light variations, can be considered samples of a low-dimension nonlinear manifold embedded in the high-dimensional observation space (the space of all possible images). The main contribution of our work is represented by a Supervised Laplacian Eigemaps (S-LE) algorithm, which exploits the class label information for mapping the original data in the embedded space. Our proposed approach benefits from two important properties: i) it is discriminative, and ii) it adaptively selects the neighbors of a sample without using any predefined neighborhood size. Experiments were conducted on four face databases and the results demonstrate that the proposed algorithm significantly outperforms many linear and non-linear embedding techniques. Although we've focused on the face recognition problem, the proposed approach could also be extended to other category of objects characterized by large variance in their appearance.
	Address	Breckenridge; CO; USA
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1550-5790	ISBN	978-1-4673-0233-3	Medium
	Area		Expedition		Conference	WACV
	Notes	OR;MV			Approved	no
	Call Number	Admin @ si @ RaD2012d			Serial	1890
Permanent link to this record



	Author	Juan Ignacio Toledo; Manuel Carbonell; Alicia Fornes; Josep Llados
	Title	Information Extraction from Historical Handwritten Document Images with a Context-aware Neural Model			Type	Journal Article
	Year	2019	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	86	Issue		Pages	27-36
	Keywords	Document image analysis; Handwritten documents; Named entity recognition; Deep neural networks
	Abstract	Many historical manuscripts that hold trustworthy memories of the past societies contain information organized in a structured layout (e.g. census, birth or marriage records). The precious information stored in these documents cannot be effectively used nor accessed without costly annotation efforts. The transcription driven by the semantic categories of words is crucial for the subsequent access. In this paper we describe an approach to extract information from structured historical handwritten text images and build a knowledge representation for the extraction of meaning out of historical data. The method extracts information, such as named entities, without the need of an intermediate transcription step, thanks to the incorporation of context information through language models. Our system has two variants, the first one is based on bigrams, whereas the second one is based on recurrent neural networks. Concretely, our second architecture integrates a Convolutional Neural Network to model visual information from word images together with a Bidirecitonal Long Short Term Memory network to model the relation among the words. This integrated sequential approach is able to extract more information than just the semantic category (e.g. a semantic category can be associated to a person in a record). Our system is generic, it deals with out-of-vocabulary words by design, and it can be applied to structured handwritten texts from different domains. The method has been validated with the ICDAR IEHHR competition protocol, outperforming the existing approaches.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097; 601.311; 603.057; 600.084; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ TCF2019			Serial	3166
Permanent link to this record



	Author	Beata Megyesi; Bernhard Esslinger; Alicia Fornes; Nils Kopal; Benedek Lang; George Lasry; Karl de Leeuw; Eva Pettersson; Arno Wacker; Michelle Waldispuhl
	Title	Decryption of historical manuscripts: the DECRYPT project			Type	Journal Article
	Year	2020	Publication	Cryptologia	Abbreviated Journal	CRYPT
	Volume	44	Issue	6	Pages	545-559
	Keywords	automatic decryption; cipher collection; historical cryptology; image transcription
	Abstract	Many historians and linguists are working individually and in an uncoordinated fashion on the identification and decryption of historical ciphers. This is a time-consuming process as they often work without access to automatic methods and processes that can accelerate the decipherment. At the same time, computer scientists and cryptologists are developing algorithms to decrypt various cipher types without having access to a large number of original ciphertexts. In this paper, we describe the DECRYPT project aiming at the creation of resources and tools for historical cryptology by bringing the expertise of various disciplines together for collecting data, exchanging methods for faster progress to transcribe, decrypt and contextualize historical encrypted manuscripts. We present our goals and work-in progress of a general approach for analyzing historical encrypted manuscripts using standardized methods and a new set of state-of-the-art tools. We release the data and tools as open-source hoping that all mentioned disciplines would benefit and contribute to the research infrastructure of historical cryptology.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ MEF2020			Serial	3347
Permanent link to this record



	Author	Anjan Dutta; Josep Llados; Horst Bunke; Umapada Pal
	Title	Product graph-based higher order contextual similarities for inexact subgraph matching			Type	Journal Article
	Year	2018	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	76	Issue		Pages	596-611
	Keywords
	Abstract	Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the tensor product graph (TPG) of two graphs to be matched, where each node is an ordered pair of nodes of the operand graphs. Contextual similarities between a pair of nodes are computed by accumulating weighted walks (normalized pairwise similarities) terminating at the corresponding paired node in TPG. Once the contextual similarities are obtained, we formulate subgraph matching as a node and edge selection problem in TPG. We use contextual similarities to construct an objective function and optimize it with a linear programming approach. Since random walk formulation through TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities and better discrimination among the nodes and edges. Experimental results shown on synthetic as well as real benchmarks illustrate that higher order contextual similarities increase discriminating power and allow one to find approximate solutions to the subgraph matching problem.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 602.167; 600.097; 600.121			Approved	no
	Call Number	Admin @ si @ DLB2018			Serial	3083
Permanent link to this record



	Author	Gabriel Villalonga
	Title	Leveraging Synthetic Data to Create Autonomous Driving Perception Systems			Type	Book Whole
	Year	2021	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Manually annotating images to develop vision models has been a major bottleneck since computer vision and machine learning started to walk together. This has been more evident since computer vision falls on the shoulders of data-hungry deep learning techniques. When addressing on-board perception for autonomous driving, the curse of data annotation is exacerbated due to the use of additional sensors such as LiDAR. Therefore, any approach aiming at reducing such a timeconsuming and costly work is of high interest for addressing autonomous driving and, in fact, for any application requiring some sort of artificial perception. In the last decade, it has been shown that leveraging from synthetic data is a paradigm worth to pursue in order to minimizing manual data annotation. The reason is that the automatic process of generating synthetic data can also produce different types of associated annotations (e.g. object bounding boxes for synthetic images and LiDAR pointclouds, pixel/point-wise semantic information, etc.). Directly using synthetic data for training deep perception models may not be the definitive solution in all circumstances since it can appear a synth-to-real domain shift. In this context, this work focuses on leveraging synthetic data to alleviate manual annotation for three perception tasks related to driving assistance and autonomous driving. In all cases, we assume the use of deep convolutional neural networks (CNNs) to develop our perception models. The first task addresses traffic sign recognition (TSR), a kind of multi-class classification problem. We assume that the number of sign classes to be recognized must be suddenly increased without having annotated samples to perform the corresponding TSR CNN re-training. We show that leveraging synthetic samples of such new classes and transforming them by a generative adversarial network (GAN) trained on the known classes (i.e. without using samples from the new classes), it is possible to re-train the TSR CNN to properly classify all the signs for a ∼ 1/4 ratio of new/known sign classes. The second task addresses on-board 2D object detection, focusing on vehicles and pedestrians. In this case, we assume that we receive a set of images without the annotations required to train an object detector, i.e. without object bounding boxes. Therefore, our goal is to self-annotate these images so that they can later be used to train the desired object detector. In order to reach this goal, we leverage from synthetic data and propose a semi-supervised learning approach based on the co-training idea. In fact, we use a GAN to reduce the synthto-real domain shift before applying co-training. Our quantitative results show that co-training and GAN-based image-to-image translation complement each other up to allow the training of object detectors without manual annotation, and still almost reaching the upper-bound performances of the detectors trained from human annotations. While in previous tasks we focus on vision-based perception, the third task we address focuses on LiDAR pointclouds. Our initial goal was to develop a 3D object detector trained on synthetic LiDAR-style pointclouds. While for images we may expect synth/real-to-real domain shift due to differences in their appearance (e.g. when source and target images come from different camera sensors), we did not expect so for LiDAR pointclouds since these active sensors factor out appearance and provide sampled shapes. However, in practice, we have seen that it can be domain shift even among real-world LiDAR pointclouds. Factors such as the sampling parameters of the LiDARs, the sensor suite configuration onboard the ego-vehicle, and the human annotation of 3D bounding boxes, do induce a domain shift. We show it through comprehensive experiments with different publicly available datasets and 3D detectors. This redirected our goal towards the design of a GAN for pointcloud-to-pointcloud translation, a relatively unexplored topic. Finally, it is worth to mention that all the synthetic datasets used for these three tasks, have been designed and generated in the context of this PhD work and will be publicly released. Overall, we think this PhD presents several steps forward to encourage leveraging synthetic data for developing deep perception models in the field of driving assistance and autonomous driving.
	Address	February 2021
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;German Ros
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-122714-2-3	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Vil2021			Serial	3599
Permanent link to this record



	Author	Jialuo Chen; M.A.Souibgui; Alicia Fornes; Beata Megyesi
	Title	A Web-based Interactive Transcription Tool for Encrypted Manuscripts			Type	Conference Article
	Year	2020	Publication	3rd International Conference on Historical Cryptology	Abbreviated Journal
	Volume		Issue		Pages	52-59
	Keywords
	Abstract	Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available.
	Address	Virtual; June 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	HistoCrypt
	Notes	DAG; 600.140; 602.230; 600.121			Approved	no
	Call Number	Admin @ si @ CSF2020			Serial	3447
Permanent link to this record



	Author	Fadi Dornaika; Bogdan Raducanu
	Title	Out-of-Sample Embedding for Manifold Learning Applied to Face Recognition			Type	Conference Article
	Year	2013	Publication	IEEE International Workshop on Analysis and Modeling of Faces and Gestures	Abbreviated Journal
	Volume		Issue		Pages	862-868
	Keywords
	Abstract	Manifold learning techniques are affected by two critical aspects: (i) the design of the adjacency graphs, and (ii) the embedding of new test data---the out-of-sample problem. For the first aspect, the proposed schemes were heuristically driven. For the second aspect, the difficulty resides in finding an accurate mapping that transfers unseen data samples into an existing manifold. Past works addressing these two aspects were heavily parametric in the sense that the optimal performance is only reached for a suitable parameter choice that should be known in advance. In this paper, we demonstrate that sparse coding theory not only serves for automatic graph reconstruction as shown in recent works, but also represents an accurate alternative for out-of-sample embedding. Considering for a case study the Laplacian Eigenmaps, we applied our method to the face recognition problem. To evaluate the effectiveness of the proposed out-of-sample embedding, experiments are conducted using the k-nearest neighbor (KNN) and Kernel Support Vector Machines (KSVM) classifiers on four public face databases. The experimental results show that the proposed model is able to achieve high categorization effectiveness as well as high consistency with non-linear embeddings/manifolds obtained in batch modes.
	Address	Portland; USA; June 2013
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPRW
	Notes	OR; 600.046;MV			Approved	no
	Call Number	Admin @ si @ DoR2013			Serial	2236
Permanent link to this record



	Author	German Ros
	Title	Visual Scene Understanding for Autonomous Vehicles: Understanding Where and What			Type	Book Whole
	Year	2016	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Making Ground Autonomous Vehicles (GAVs) a reality as a service for the society is one of the major scientific and technological challenges of this century. The potential benefits of autonomous vehicles include reducing accidents, improving traffic congestion and better usage of road infrastructures, among others. These vehicles must operate in our cities, towns and highways, dealing with many different types of situations while respecting traffic rules and protecting human lives. GAVs are expected to deal with all types of scenarios and situations, coping with an uncertain and chaotic world. Therefore, in order to fulfill these demanding requirements GAVs need to be endowed with the capability of understanding their surrounding at many different levels, by means of affordable sensors and artificial intelligence. This capacity to understand the surroundings and the current situation that the vehicle is involved in is called scene understanding. In this work we investigate novel techniques to bring scene understanding to autonomous vehicles by combining the use of cameras as the main source of information—due to their versatility and affordability—and algorithms based on computer vision and machine learning. We investigate different degrees of understanding of the scene, starting from basic geometric knowledge about where is the vehicle within the scene. A robust and efficient estimation of the vehicle location and pose with respect to a map is one of the most fundamental steps towards autonomous driving. We study this problem from the point of view of robustness and computational efficiency, proposing key insights to improve current solutions. Then we advance to higher levels of abstraction to discover what is in the scene, by recognizing and parsing all the elements present on a driving scene, such as roads, sidewalks, pedestrians, etc. We investigate this problem known as semantic segmentation, proposing new approaches to improve recognition accuracy and computational efficiency. We cover these points by focusing on key aspects such as: (i) how to leverage computation moving semantics to an offline process, (ii) how to train compact architectures based on deconvolutional networks to achieve their maximum potential, (iii) how to use virtual worlds in combination with domain adaptation to produce accurate models in a cost-effective fashion, and (iv) how to use transfer learning techniques to prepare models to new situations. We finally extend the previous level of knowledge enabling systems to reasoning about what has change in a scene with respect to a previous visit, which in return allows for efficient and cost-effective map updating.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Angel Sappa;Julio Guerrero;Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-1-8	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ Ros2016			Serial	2860
Permanent link to this record



	Author	I. Sorodoc; S. Pezzelle; A. Herbelot; Mariella Dimiccoli; R. Bernardi
	Title	Learning quantification from images: A structured neural architecture			Type	Journal Article
	Year	2018	Publication	Natural Language Engineering	Abbreviated Journal	NLE
	Volume	24	Issue	3	Pages	363-392
	Keywords
	Abstract	Major advances have recently been made in merging language and vision representations. Most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw multimodal data to perform certain types of higher level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like few, some and all. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in most fish are red, most encodes the proportion of fish which are red fish. In this paper, we study how well current neural network strategies model such relations. We propose a task where, given an image and a query expressed by an object–property pair, the system must return a quantifier expressing which proportions of the queried object have the queried property. Our contributions are twofold. First, we show that the best performance on this task involves coupling state-of-the-art attention mechanisms with a network architecture mirroring the logical structure assigned to quantifiers by classic linguistic formalisation. Second, we introduce a new balanced dataset of image scenarios associated with quantification queries, which we hope will foster further research in this area.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no menciona			Approved	no
	Call Number	Admin @ si @ SPH2018			Serial	3021
Permanent link to this record



	Author	Amir A.Amini; Yasheng Chen; Mohamed Elayyadi; Petia Radeva
	Title	Tag Surface Reconstruction and Tracking of Myocardial Beads from SPAMM-MRI with Parametric B-Spline Surfaces			Type	Journal
	Year	2001	Publication	IEEE Transactions on Medical Imaging	Abbreviated Journal	TMI
	Volume	20	Issue	2	Pages	94–103
	Keywords	B-spline surfaces, cardiac motion, myocardial beads, myocardial infarction, tagged MRI.
	Abstract	Magnetic resonance imaging (MRI) is unique in its ability to noninvasively and selectively alter tissue magnetization, and create tag planes intersecting image slices. The resulting grid of signal voids allows for tracking deformations of tissues in otherwise homogeneous-signal myocardial regions. In this paper, we propose a specific spatial modulation of magnetization (SPAMM) imaging protocol together with efficient techniques for measurement of three-dimensional (3-D) motion of material points of the human heart (referred to as myocardial beads) from images collected with the SPAMM method. The techniques make use of tagged images in orthogonal views by explicitly reconstructing 3-D B-spline surface representation of tag planes (tag planes in two orthogonal orientations intersecting the short-axis (SA) image slices and tag planes in an orientation orthogonal to the short-axis tag planes intersecting long-axis (LA) image slices). The developed methods allow for viewing deformations of 3-D tag surfaces, spatial correspondence of long-axis and short-axis image slice and tag positions, as well as nonrigid movement of myocardial beads as a function of time.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB			Approved	no
	Call Number	BCNPCL @ bcnpcl @ ACE2001; IAM @ iam @ ACE2001			Serial	180
Permanent link to this record



	Author	Raul Gomez
	Title	Exploiting the Interplay between Visual and Textual Data for Scene Interpretation			Type	Book Whole
	Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Machine learning experimentation under controlled scenarios and standard datasets is necessary to compare algorithms performance by evaluating all of them in the same setup. However, experimentation on how those algorithms perform on unconstrained data and applied tasks to solve real world problems is also a must to ascertain how that research can contribute to our society. In this dissertation we experiment with the latest computer vision and natural language processing algorithms applying them to multimodal scene interpretation. Particularly, we research on how image and text understanding can be jointly exploited to address real world problems, focusing on learning from Social Media data. We address several tasks that involve image and textual information, discuss their characteristics and offer our experimentation conclusions. First, we work on detection of scene text in images. Then, we work with Social Media posts, exploiting the captions associated to images as supervision to learn visual features, which we apply to multimodal semantic image retrieval. Subsequently, we work with geolocated Social Media images with associated tags, experimenting on how to use the tags as supervision, on location sensitive image retrieval and on exploiting location information for image tagging. Finally, we work on a specific classification problem of Social Media publications consisting on an image and a text: Multimodal hate speech classification.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Dimosthenis Karatzas;Lluis Gomez;Jaume Gibert
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-7-1	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Gom20			Serial	3479
Permanent link to this record



	Author	Antonio Esteban Lansaque
	Title	An Endoscopic Navigation System for Lung Cancer Biopsy			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Lung cancer is one of the most diagnosed cancers among men and women. Actually, lung cancer accounts for 13% of the total cases with a 5-year global survival rate in patients. Although Early detection increases survival rate from 38% to 67%, accurate diagnosis remains a challenge. Pathological confirmation requires extracting a sample of the lesion tissue for its biopsy. The preferred procedure for tissue biopsy is called bronchoscopy. A bronchoscopy is an endoscopic technique for the internal exploration of airways which facilitates the performance of minimal invasive interventions with low risk for the patient. Recent advances in bronchoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures, like lung cancer biopsy sampling. Despite the improvement in bronchoscopic device quality, there is a lack of intelligent computational systems for supporting in-vivo clinical decision during examinations. Existing technologies fail to accurately reach the lesion due to several aspects at intervention off-line planning and poor intra-operative guidance at exploration time. Existing guiding systems radiate patients and clinical staff,might be expensive and achieve a suboptimlal 70% of yield boost. Diagnostic yield could be improved reducing radiation and costs by developing intra-operative support systems able to guide the bronchoscopist to the lesion during the intervention. The goal of this PhD thesis is to develop an image-based navigation systemfor intra-operative guidance of bronchoscopists to a target lesion across a path previously planned on a CT-scan. We propose a 3D navigation system which uses the anatomy of video bronchoscopy frames to locate the bronchoscope within the airways. Once the bronchoscope is located, our navigation system is able to indicate the bifurcation which needs to be followed to reach the lesion. In order to facilitate an off-line validation as realistic as possible, we also present a method for augmenting simulated virtual bronchoscopies with the appearance of intra-operative videos. Experiments performed on augmented and intra-operative videos, prove that our algorithm can be speeded up for an on-line implementation in the operating room.
	Address	October 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Debora Gil;Carles Sanchez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-0-2	Medium
	Area		Expedition		Conference
	Notes	IAM; 600.139; 600.145			Approved	no
	Call Number	Admin @ si @ Est2019			Serial	3392
Permanent link to this record



	Author	Mohamed Ali Souibgui; Ali Furkan Biten; Sounak Dey; Alicia Fornes; Yousri Kessentini; Lluis Gomez; Dimosthenis Karatzas; Josep Llados
	Title	One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition			Type	Conference Article
	Year	2022	Publication	Winter Conference on Applications of Computer Vision	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	Document Analysis
	Abstract	Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). This appears, for example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the content. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method, achieving competitive results compared to the usage of real annotated data.
	Address	Virtual; January 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WACV
	Notes	DAG; 602.230; 600.140			Approved	no
	Call Number	Admin @ si @ SBD2022			Serial	3615
Permanent link to this record