Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 >>

Details

Records
Author	Lei Kang; Marçal Rusiñol; Alicia Fornes; Pau Riba; Mauricio Villegas
Title	Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition			Type	Conference Article
Year	2020	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.
Address	Aspen; Colorado; USA; March 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.129; 600.140; 601.302; 601.312; 600.121			Approved	no
Call Number	Admin @ si @ KRF2020			Serial	3446
Permanent link to this record



Author	Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas
Title	Exploring Hate Speech Detection in Multimodal Publications			Type	Conference Article
Year	2020	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.
Address	Aspen; March 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ GGG2020a			Serial	3280
Permanent link to this record



Author	Edgar Riba; D. Mishkin; Daniel Ponsa; E. Rublee; G. Bradski
Title	Kornia: an Open Source Differentiable Computer Vision Library for PyTorch			Type	Conference Article
Year	2020	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Aspen; Colorado; USA; March 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	MSIAU; 600.122; 600.130			Approved	no
Call Number	Admin @ si @ RMP2020			Serial	3291
Permanent link to this record



Author	Sergio Escalera; Ralf Herbrich
Title	The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations			Type	Book Whole
Year	2020	Publication	The Springer Series on Challenges in Machine Learning	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor	Sergio Escalera; Ralf Hebrick
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	2520-1328	ISBN	978-3-030-29134-1	Medium
Area		Expedition		Conference
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ HeE2020			Serial	3328
Permanent link to this record



Author	Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas
Title	Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features			Type	Conference Article
Year	2020	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
Address	Aspen; Colorado; USA; March 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ MDB2020			Serial	3334
Permanent link to this record



Author	Arnau Baro; Alicia Fornes; Carles Badal
Title	Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism			Type	Conference Article
Year	2020	Publication	17th International Conference on Frontiers in Handwriting Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks.
Address	Virtual ICFHR; September 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICFHR
Notes	DAG; 600.140; 600.121			Approved	no
Call Number	Admin @ si @ BFB2020			Serial	3448
Permanent link to this record



Author	Alicia Fornes; Josep Llados; Joana Maria Pujadas-Mora
Title	Browsing of the Social Network of the Past: Information Extraction from Population Manuscript Images			Type	Book Chapter
Year	2020	Publication	Handwritten Historical Document Analysis, Recognition, and Retrieval – State of the Art and Future Trends	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher	World Scientific	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-981-120-323-7	Medium
Area		Expedition		Conference
Notes	DAG; 600.140; 600.121			Approved	no
Call Number	Admin @ si @ FLP2020			Serial	3350
Permanent link to this record



Author	Jialuo Chen; M.A.Souibgui; Alicia Fornes; Beata Megyesi
Title	A Web-based Interactive Transcription Tool for Encrypted Manuscripts			Type	Conference Article
Year	2020	Publication	3rd International Conference on Historical Cryptology	Abbreviated Journal
Volume		Issue		Pages	52-59
Keywords
Abstract	Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available.
Address	Virtual; June 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	HistoCrypt
Notes	DAG; 600.140; 602.230; 600.121			Approved	no
Call Number	Admin @ si @ CSF2020			Serial	3447
Permanent link to this record



Author	David Berga; Xavier Otazu
Title	Computations of top-down attention by modulating V1 dynamics			Type	Conference Article
Year	2020	Publication	Computational and Mathematical Models in Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	St. Pete Beach; Florida; May 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	MODVIS
Notes	NEUROBIT			Approved	no
Call Number	Admin @ si @ BeO2020a			Serial	3376
Permanent link to this record



Author	Yaxing Wang
Title	Transferring and Learning Representations for Image Generation and Translation			Type	Book Whole
Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation.
Address	January 2020
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Abel Gonzalez;Luis Herranz
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-121011-5-7	Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ Wan2020			Serial	3397
Permanent link to this record



Author	Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
Title	RoadText-1K: Text Detection and Recognition Dataset for Driving Videos			Type	Conference Article
Year	2020	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/ projects/cvit-projects/roadtext-1k
Address	Paris; Francia; ???
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICRA
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ RMG2020			Serial	3400
Permanent link to this record



Author	Lorenzo Porzi; Markus Hofinger; Idoia Ruiz; Joan Serrat; Samuel Rota Bulo; Peter Kontschieder
Title	Learning Multi-Object Tracking and Segmentation from Automatic Annotations			Type	Conference Article
Year	2020	Publication	33rd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	6845-6854
Keywords
Abstract	In this work we contribute a novel pipeline to automatically generate training data, and to improve over state-of-the-art multi-object tracking and segmentation (MOTS) methods. Our proposed track mining algorithm turns raw street-level videos into high-fidelity MOTS training data, is scalable and overcomes the need of expensive and time-consuming manual annotation approaches. We leverage state-of-the-art instance segmentation results in combination with optical flow predictions, also trained on automatically harvested training data. Our second major contribution is MOTSNet – a deep learning, tracking-by-detection architecture for MOTS – deploying a novel mask-pooling layer for improved object association over time. Training MOTSNet with our automatically extracted data leads to significantly improved sMOTSA scores on the novel KITTI MOTS dataset (+1.9%/+7.5% on cars/pedestrians), and MOTSNet improves by +4.1% over previously best methods on the MOTSChallenge dataset. Our most impressive finding is that we can improve over previous best-performing works, even in complete absence of manually annotated MOTS training data.
Address	virtual; June 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPR
Notes	ADAS; 600.124; 600.118			Approved	no
Call Number	Admin @ si @ PHR2020			Serial	3402
Permanent link to this record



Author	Vacit Oguz Yazici; Abel Gonzalez-Garcia; Arnau Ramisa; Bartlomiej Twardowski; Joost Van de Weijer
Title	Orderless Recurrent Models for Multi-label Classification			Type	Conference Article
Year	2020	Publication	33rd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Recurrent neural networks (RNN) are popular for many computer vision tasks, including multi-label classification. Since RNNs produce sequential outputs, labels need to be ordered for the multi-label classification task. Current approaches sort labels according to their frequency, typically ordering them in either rare-first or frequent-first. These imposed orderings do not take into account that the natural order to generate the labels can change for each image, e.g.\ first the dominant object before summing up the smaller objects in the image. Therefore, in this paper, we propose ways to dynamically order the ground truth labels with the predicted label sequence. This allows for the faster training of more optimal LSTM models for multi-label classification. Analysis evidences that our method does not suffer from duplicate generation, something which is common for other models. Furthermore, it outperforms other CNN-RNN models, and we show that a standard architecture of an image encoder and language decoder trained with our proposed loss obtains the state-of-the-art results on the challenging MS-COCO, WIDER Attribute and PA-100K and competitive results on NUS-WIDE.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPR
Notes	LAMP; 600.109; 601.309; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ YGR2020			Serial	3408
Permanent link to this record



Author	Xialei Liu; Chenshen Wu; Mikel Menta; Luis Herranz; Bogdan Raducanu; Andrew Bagdanov; Shangling Jui; Joost Van de Weijer
Title	Generative Feature Replay for Class-Incremental Learning			Type	Conference Article
Year	2020	Publication	CLVISION – Workshop on Continual Learning in Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Humans are capable of learning new tasks without forgetting previous ones, while neural networks fail due to catastrophic forgetting between new and previously-learned tasks. We consider a class-incremental setting which means that the task-ID is unknown at inference time. The imbalance between old and new classes typically results in a bias of the network towards the newest ones. This imbalance problem can either be addressed by storing exemplars from previous tasks, or by using image replay methods. However, the latter can only be applied to toy datasets since image generation for complex datasets is a hard problem. We propose a solution to the imbalance problem based on generative feature replay which does not require any exemplars. To do this, we split the network into two parts: a feature extractor and a classifier. To prevent forgetting, we combine generative feature replay in the classifier with feature distillation in the feature extractor. Through feature generation, our method reduces the complexity of generative replay and prevents the imbalance problem. Our approach is computationally efficient and scalable to large datasets. Experiments confirm that our approach achieves state-of-the-art results on CIFAR-100 and ImageNet, while requiring only a fraction of the storage needed for exemplar-based continual learning
Address	Virtual CVPR
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	LAMP; 601.309; 602.200; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ LWM2020			Serial	3419
Permanent link to this record



Author	Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas
Title	Location Sensitive Image Retrieval and Tagging			Type	Conference Article
Year	2020	Publication	16th European Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.
Address	Virtual; August 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECCV
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ GGG2020b			Serial	3420
Permanent link to this record