Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	2371–2385 of 3413 records found matching your query (RSS):

Search & Display Options

Select All Deselect All

[141–150] << 151 152 153 154 155 156 157 158 159 160 >> [161–170]

List View

Citations

Details

	Records
	Author	Lei Kang
	Title	Robust Handwritten Text Recognition in Scarce Labeling Scenarios: Disentanglement, Adaptation and Generation			Type	Book Whole
	Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Handwritten documents are not only preserved in historical archives but also widely used in administrative documents such as cheques and claims. With the rise of the deep learning era, many state-of-the-art approaches have achieved good performance on specific datasets for Handwritten Text Recognition (HTR). However, it is still challenging to solve real use cases because of the varied handwriting styles across different writers and the limited labeled data. Thus, both explorin a more robust handwriting recognition architectures and proposing methods to diminish the gap between the source and target data in an unsupervised way are demanded. In this thesis, firstly, we explore novel architectures for HTR, from Sequence-to-Sequence (Seq2Seq) method with attention mechanism to non-recurrent Transformer-based method. Secondly, we focus on diminishing the performance gap between source and target data in an unsupervised way. Finally, we propose a group of generative methods for handwritten text images, which could be utilized to increase the training set to obtain a more robust recognizer. In addition, by simply modifying the generative method and joining it with a recognizer, we end up with an effective disentanglement method to distill textual content from handwriting styles so as to achieve a generalized recognition performance. We outperform state-of-the-art HTR performances in the experimental results among different scientific and industrial datasets, which prove the effectiveness of the proposed methods. To the best of our knowledge, the non-recurrent recognizer and the disentanglement method are the first contributions in the handwriting recognition field. Furthermore, we have outlined the potential research lines, which would be interesting to explore in the future.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Alicia Fornes;Marçal Rusiñol;Mauricio Villegas
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-122714-0-9	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Kan20			Serial	3482
Permanent link to this record



	Author	Manuel Carbonell
	Title	Neural Information Extraction from Semi-structured Documents A			Type	Book Whole
	Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Sectors as fintech, legaltech or insurance process an inflow of millions of forms, invoices, id documents, claims or similar every day. Together with these, historical archives provide gigantic amounts of digitized documents containing useful information that needs to be stored in machine encoded text with a meaningful structure. This procedure, known as information extraction (IE) comprises the steps of localizing and recognizing text, identifying named entities contained in it and optionally finding relationships among its elements. In this work we explore multi-task neural models at image and graph level to solve all steps in a unified way. While doing so we find benefits and limitations of these end-to-end approaches in comparison with sequential separate methods. More specifically, we first propose a method to produce textual as well as semantic labels with a unified model from handwritten text line images. We do so with the use of a convolutional recurrent neural model trained with connectionist temporal classification to predict the textual as well as semantic information encoded in the images. Secondly, motivated by the success of this approach we investigate the unification of the localization and recognition tasks of handwritten text in full pages with an end-to-end model, observing benefits in doing so. Having two models that tackle information extraction subsequent task pairs in an end-to-end to end manner, we lastly contribute with a method to put them all together in a single neural network to solve the whole information extraction pipeline in a unified way. Doing so we observe some benefits and some limitations in the approach, suggesting that in certain cases it is beneficial to train specialized models that excel at a single challenging task of the information extraction process, as it can be the recognition of named entities or the extraction of relationships between them. For this reason we lastly study the use of the recently arrived graph neural network architectures for the semantic tasks of the information extraction process, which are recognition of named entities and relation extraction, achieving promising results on the relation extraction part.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Alicia Fornes;Mauricio Villegas;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-122714-1-6	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Car20			Serial	3483
Permanent link to this record



	Author	Riccardo Del Chiaro; Bartlomiej Twardowski; Andrew Bagdanov; Joost Van de Weijer
	Title	Recurrent attention to transient tasks for continual image captioning			Type	Conference Article
	Year	2020	Publication	34th Conference on Neural Information Processing Systems	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight egularization and knowledge distillation to recurrent continual learning problems. We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.
	Address	virtual; December 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	NEURIPS
	Notes	LAMP; 600.120			Approved	no
	Call Number	Admin @ si @ CTB2020			Serial	3484
Permanent link to this record



	Author	Yaxing Wang; Lu Yu; Joost Van de Weijer
	Title	DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs			Type	Conference Article
	Year	2020	Publication	34th Conference on Neural Information Processing Systems	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Image-to-image translation has recently achieved remarkable results. But despite current success, it suffers from inferior performance when translations between classes require large shape changes. We attribute this to the high-resolution bottlenecks which are used by current state-of-the-art image-to-image methods. Therefore, in this work, we propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I. We learn a model by leveraging hierarchical features: (a) structural information contained in the shallow layers and (b) semantic information extracted from the deep layers. To enable the training of deep I2I models on small datasets, we propose a novel transfer learning method, that transfers knowledge from pre-trained GANs. Specifically, we leverage the discriminator of a pre-trained GANs (i.e. BigGAN or StyleGAN) to initialize both the encoder and the discriminator and the pre-trained generator to initialize the generator of our model. Applying knowledge transfer leads to an alignment problem between the encoder and generator. We introduce an adaptor network to address this. On many-class image-to-image translation on three datasets (Animal faces, Birds, and Foods) we decrease mFID by at least 35% when compared to the state-of-the-art. Furthermore, we qualitatively and quantitatively demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets. Finally, we are the first to perform I2I translations for domains with over 100 classes.
	Address	virtual; December 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	NEURIPS
	Notes	LAMP; 600.120			Approved	no
	Call Number	Admin @ si @ WYW2020			Serial	3485
Permanent link to this record



	Author	Yaxing Wang; Salman Khan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan
	Title	Semi-supervised Learning for Few-shot Image-to-Image Translation			Type	Conference Article
	Year	2020	Publication	33rd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In the last few years, unpaired image-to-image translation has witnessed remarkable progress. Although the latest methods are able to generate realistic images, they crucially rely on a large number of labeled images. Recently, some methods have tackled the challenging setting of few-shot image-to-image translation, reducing the labeled data requirements for the target domain during inference. In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training. To do so, we propose applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure. We also apply a cycle consistency constraint to further exploit the information from unlabeled images, either from the same dataset or external. Additionally, we propose several structural modifications to facilitate the image translation task under these circumstances. Our semi-supervised method for few-shot image translation, called SEMIT, achieves excellent results on four different datasets using as little as 10% of the source labels, and matches the performance of the main fully-supervised competitor using only 20% labeled data. Our code and models are made public at: this https URL.
	Address	Virtual; June 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	LAMP; 600.120			Approved	no
	Call Number	Admin @ si @ WKG2020			Serial	3486
Permanent link to this record



	Author	Yi Xiao; Felipe Codevilla; Christopher Pal; Antonio Lopez
	Title	Action-Based Representation Learning for Autonomous Driving			Type	Conference Article
	Year	2020	Publication	Conference on Robot Learning	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seemingly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).
	Address	virtual; November 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CORL
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ XCP2020			Serial	3487
Permanent link to this record



	Author	Gabriel Villalonga; Antonio Lopez
	Title	Co-Training for On-Board Deep Object Detection			Type	Journal Article
	Year	2020	Publication	IEEE Access	Abbreviated Journal	ACCESS
	Volume		Issue		Pages	194441 - 194456
	Keywords
	Abstract	Providing ground truth supervision to train visual models has been a bottleneck over the years, exacerbated by domain shifts which degenerate the performance of such models. This was the case when visual tasks relied on handcrafted features and shallow machine learning and, despite its unprecedented performance gains, the problem remains open within the deep learning paradigm due to its data-hungry nature. Best performing deep vision-based object detectors are trained in a supervised manner by relying on human-labeled bounding boxes which localize class instances (i.e. objects) within the training images. Thus, object detection is one of such tasks for which human labeling is a major bottleneck. In this article, we assess co-training as a semi-supervised learning method for self-labeling objects in unlabeled images, so reducing the human-labeling effort for developing deep object detectors. Our study pays special attention to a scenario involving domain shift; in particular, when we have automatically generated virtual-world images with object bounding boxes and we have real-world images which are unlabeled. Moreover, we are particularly interested in using co-training for deep object detection in the context of driver assistance systems and/or self-driving vehicles. Thus, using well-established datasets and protocols for object detection in these application contexts, we will show how co-training is a paradigm worth to pursue for alleviating object labeling, working both alone and together with task-agnostic domain adaptation.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ ViL2020			Serial	3488
Permanent link to this record



	Author	Hannes Mueller; Andre Groger; Jonathan Hersh; Andrea Matranga; Joan Serrat
	Title	Monitoring War Destruction from Space: A Machine Learning Approach			Type	Miscellaneous
	Year	2020	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Existing data on building destruction in conflict zones rely on eyewitness reports or manual detection, which makes it generally scarce, incomplete and potentially biased. This lack of reliable data imposes severe limitations for media reporting, humanitarian relief efforts, human rights monitoring, reconstruction initiatives, and academic studies of violent conflict. This article introduces an automated method of measuring destruction in high-resolution satellite images using deep learning techniques combined with data augmentation to expand training samples. We apply this method to the Syrian civil war and reconstruct the evolution of damage in major cities across the country. The approach allows generating destruction data with unprecedented scope, resolution, and frequency – only limited by the available satellite imagery – which can alleviate data limitations decisively.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ MGH2020			Serial	3489
Permanent link to this record



	Author	Yi Xiao; Felipe Codevilla; Akhil Gurram; Onay Urfalioglu; Antonio Lopez
	Title	Multimodal end-to-end autonomous driving			Type	Journal Article
	Year	2020	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
	Volume		Issue		Pages	1-11
	Keywords
	Abstract	A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g., LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ XCG2020			Serial	3490
Permanent link to this record



	Author	Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas
	Title	Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval			Type	Conference Article
	Year	2021	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
	Volume		Issue		Pages	4022-4032
	Keywords
	Abstract
	Address	Virtual; January 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WACV
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ MDB2021			Serial	3491
Permanent link to this record



	Author	Andres Mafla; Rafael S. Rezende; Lluis Gomez; Diana Larlus; Dimosthenis Karatzas
	Title	StacMR: Scene-Text Aware Cross-Modal Retrieval			Type	Conference Article
	Year	2021	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
	Volume		Issue		Pages	2219-2229
	Keywords
	Abstract
	Address	Virtual; January 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WACV
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ MRG2021a			Serial	3492
Permanent link to this record



	Author	Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
	Title	Real-time Lexicon-free Scene Text Retrieval			Type	Journal Article
	Year	2021	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	110	Issue		Pages	107656
	Keywords
	Abstract	In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121; 600.129; 601.338			Approved	no
	Call Number	Admin @ si @ MTD2021			Serial	3493
Permanent link to this record



	Author	Lluis Gomez; Anguelos Nicolaou; Marçal Rusiñol; Dimosthenis Karatzas
	Title	12 years of ICDAR Robust Reading Competitions: The evolution of reading systems for unconstrained text understanding			Type	Book Chapter
	Year	2020	Publication	Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	K. Alahari; C.V. Jawahar
	Language		Summary Language		Original Title
	Series Editor		Series Title	Series on Advances in Computer Vision and Pattern Recognition	Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	GNR2020			Serial	3494
Permanent link to this record



	Author	Lluis Gomez; Dena Bazazian; Dimosthenis Karatzas
	Title	Historical review of scene text detection research			Type	Book Chapter
	Year	2020	Publication	Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	K. Alahari; C.V. Jawahar
	Language		Summary Language		Original Title
	Series Editor		Series Title	Series on Advances in Computer Vision and Pattern Recognition	Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ GBK2020			Serial	3495
Permanent link to this record



	Author	Jon Almazan; Lluis Gomez; Suman Ghosh; Ernest Valveny; Dimosthenis Karatzas
	Title	WATTS: A common representation of word images and strings using embedded attributes for text recognition and retrieval			Type	Book Chapter
	Year	2020	Publication	Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	Analysis”, K. Alahari; C.V. Jawahar
	Language		Summary Language		Original Title
	Series Editor		Series Title	Series on Advances in Computer Vision and Pattern Recognition	Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ AGG2020			Serial	3496
Permanent link to this record