|
Records |
Links |
|
Author |
Julio C. S. Jacques Junior; Cagri Ozcinar; Marina Marjanovic; Xavier Baro; Gholamreza Anbarjafari; Sergio Escalera |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
On the effect of age perception biases for real age regression |
Type |
Conference Article |
|
Year |
2019 |
Publication |
14th IEEE International Conference on Automatic Face and Gesture Recognition |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Automatic age estimation from facial images represents an important task in computer vision. This paper analyses the effect of gender, age, ethnic, makeup and expression attributes of faces as sources of bias to improve deep apparent age prediction. Following recent works where it is shown that apparent age labels benefit real age estimation, rather than direct real to real age regression, our main contribution is the integration, in an end-to-end architecture, of face attributes for apparent age prediction with an additional loss for real age regression. Experimental results on the APPA-REAL dataset indicate the proposed network successfully take advantage of the adopted attributes to improve both apparent and real age estimation. Our model outperformed a state-of-the-art architecture proposed to separately address apparent and real age regression. Finally, we present preliminary results and discussion of a proof of concept application using the proposed model to regress the apparent age of an individual based on the gender of an external observer. |
|
|
Address |
Lille; France; May 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
FG |
|
|
Notes |
HuPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ JOM2019 |
Serial |
3262 |
|
Permanent link to this record |
|
|
|
|
Author |
Bojana Gajic; Ariel Amato; Ramon Baldrich; Carlo Gatta |
![download PDF file pdf](img/file_PDF.gif)
|
|
Title |
Bag of Negatives for Siamese Architectures |
Type |
Conference Article |
|
Year |
2019 |
Publication |
30th British Machine Vision Conference |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Training a Siamese architecture for re-identification with a large number of identities is a challenging task due to the difficulty of finding relevant negative samples efficiently. In this work we present Bag of Negatives (BoN), a method for accelerated and improved training of Siamese networks that scales well on datasets with a very large number of identities. BoN is an efficient and loss-independent method, able to select a bag of high quality negatives, based on a novel online hashing strategy. |
|
|
Address |
Cardiff; United Kingdom; September 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
BMVC |
|
|
Notes |
CIC; 600.140; 600.118 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GAB2019b |
Serial |
3263 |
|
Permanent link to this record |
|
|
|
|
Author |
Raul Gomez; Ali Furkan Biten; Lluis Gomez; Jaume Gibert; Marçal Rusiñol; Dimosthenis Karatzas |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Selective Style Transfer for Text |
Type |
Conference Article |
|
Year |
2019 |
Publication |
15th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
805-812 |
|
|
Keywords |
transfer; text style transfer; data augmentation; scene text detection |
|
|
Abstract |
This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross-modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means
transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available. |
|
|
Address |
Sydney; Australia; September 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.129; 600.135; 601.338; 601.310; 600.121 |
Approved |
no |
|
|
Call Number |
GBG2019 |
Serial |
3265 |
|
Permanent link to this record |
|
|
|
|
Author |
Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Self-Supervised Learning from Web Data for Multimodal Retrieval |
Type |
Book Chapter |
|
Year |
2019 |
Publication |
Multi-Modal Scene Understanding Book |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
279-306 |
|
|
Keywords |
self-supervised learning; webly supervised learning; text embeddings; multimodal retrieval; multimodal embedding |
|
|
Abstract |
Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated text without supervision and analyze the semantic structure of the learnt joint image and text embeddingspace. Weperformathoroughanalysisandperformancecomparisonoffivedifferentstateof the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text basedimageretrievaltask,andweclearlyoutperformstateoftheartintheMIRFlickrdatasetwhen training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.129; 601.338; 601.310 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GGG2019 |
Serial |
3266 |
|
Permanent link to this record |
|
|
|
|
Author |
Rafael E. Rivadeneira; Patricia Suarez; Angel Sappa; Boris X. Vintimilla |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Thermal Image SuperResolution Through Deep Convolutional Neural Network |
Type |
Conference Article |
|
Year |
2019 |
Publication |
16th International Conference on Images Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
417-426 |
|
|
Keywords |
|
|
|
Abstract |
Due to the lack of thermal image datasets, a new dataset has been acquired for proposed a super-resolution approach using a Deep Convolution Neural Network schema. In order to achieve this image enhancement process, a new thermal images dataset is used. Different experiments have been carried out, firstly, the proposed architecture has been trained using only images of the visible spectrum, and later it has been trained with images of the thermal spectrum, the results showed that with the network trained with thermal images, better results are obtained in the process of enhancing the images, maintaining the image details and perspective. The thermal dataset is available at http://www.
cidis.espol.edu.ec/es/dataset. |
|
|
Address |
Waterloo; Canada; August 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIAR |
|
|
Notes |
MSIAU; 600.130; 601.349; 600.122 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RSS2019 |
Serial |
3269 |
|
Permanent link to this record |
|
|
|
|
Author |
Angel Morera; Angel Sanchez; Angel Sappa; Jose F. Velez |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Robust Detection of Outdoor Urban Advertising Panels in Static Images |
Type |
Conference Article |
|
Year |
2019 |
Publication |
18th International Conference on Practical Applications of Agents and Multi-Agent Systems |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
246-256 |
|
|
Keywords |
Object detection; Urban ads panels; Deep learning; Single Shot Detector (SSD) architecture; Intersection over Union (IoU) metric; Augmented Reality |
|
|
Abstract |
One interesting publicity application for Smart City environments is recognizing brand information contained in urban advertising panels. For such a purpose, a previous stage is to accurately detect and locate the position of these panels in images. This work presents an effective solution to this problem using a Single Shot Detector (SSD) based on a deep neural network architecture that minimizes the number of false detections under multiple variable conditions regarding the panels and the scene. Achieved experimental results using the Intersection over Union (IoU) accuracy metric make this proposal applicable in real complex urban images. |
|
|
Address |
Aquila; Italia; June 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
PAAMS |
|
|
Notes |
MSIAU; 600.130; 600.122 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MSS2019 |
Serial |
3270 |
|
Permanent link to this record |
|
|
|
|
Author |
Armin Mehri; Angel Sappa |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Colorizing Near Infrared Images through a Cyclic Adversarial Approach of Unpaired Samples |
Type |
Conference Article |
|
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision and Pattern Recognition-Workshops |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This paper presents a novel approach for colorizing near infrared (NIR) images. The approach is based on image-to-image translation using a Cycle-Consistent adversarial network for learning the color channels on unpaired dataset. This architecture is able to handle unpaired datasets. The approach uses as generators tailored networks that require less computation times, converge faster and generate high quality samples. The obtained results have been quantitatively—using standard evaluation metrics—and qualitatively evaluated showing considerable improvements with respect to the state of the art |
|
|
Address |
Long beach; California; USA; June 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPRW |
|
|
Notes |
MSIAU; 600.130; 601.349; 600.122 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MeS2019 |
Serial |
3271 |
|
Permanent link to this record |
|
|
|
|
Author |
Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud |
![download PDF file pdf](img/file_PDF.gif)
|
|
Title |
Image Vegetation Index through a Cycle Generative Adversarial Network |
Type |
Conference Article |
|
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision and Pattern Recognition-Workshops |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This paper proposes a novel approach to estimate the Normalized Difference Vegetation Index (NDVI) just from an RGB image. The NDVI values are obtained by using images from the visible spectral band together with a synthetic near infrared image obtained by a cycled GAN. The cycled GAN network is able to obtain a NIR image from a given gray scale image. It is trained by using unpaired set of gray scale and NIR images by using a U-net architecture and a multiple loss function (gray scale images are obtained from the provided RGB images). Then, the NIR image estimated with the proposed cycle generative adversarial network is used to compute the NDVI index. Experimental results are provided showing the validity of the proposed approach. Additionally, comparisons with previous approaches are also provided. |
|
|
Address |
Long beach; California; USA; June 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPRW |
|
|
Notes |
MSIAU; 600.130; 601.349; 600.122 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SSV2019 |
Serial |
3272 |
|
Permanent link to this record |
|
|
|
|
Author |
Arnau Baro; Jialuo Chen; Alicia Fornes; Beata Megyesi |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Towards a generic unsupervised method for transcription of encoded manuscripts |
Type |
Conference Article |
|
Year |
2019 |
Publication |
3rd International Conference on Digital Access to Textual Cultural Heritage |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
73-78 |
|
|
Keywords |
A. Baró, J. Chen, A. Fornés, B. Megyesi. |
|
|
Abstract |
Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an un-supervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods. |
|
|
Address |
Brussels; May 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DATeCH |
|
|
Notes |
DAG; 600.097; 600.140; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BCF2019 |
Serial |
3276 |
|
Permanent link to this record |
|
|
|
|
Author |
Lei Kang; Marçal Rusiñol; Alicia Fornes; Pau Riba; Mauricio Villegas |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition |
Type |
Conference Article |
|
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step. |
|
|
Address |
Aspen; Colorado; USA; March 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.129; 600.140; 601.302; 601.312; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KRF2020 |
Serial |
3446 |
|
Permanent link to this record |
|
|
|
|
Author |
Lichao Zhang; Abel Gonzalez-Garcia; Joost Van de Weijer; Martin Danelljan; Fahad Shahbaz Khan |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Learning the Model Update for Siamese Trackers |
Type |
Conference Article |
|
Year |
2019 |
Publication |
18th IEEE International Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
4009-4018 |
|
|
Keywords |
|
|
|
Abstract |
Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score. |
|
|
Address |
Seul; Corea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICCV |
|
|
Notes |
LAMP; 600.109; 600.141; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ ZGW2019 |
Serial |
3295 |
|
Permanent link to this record |
|
|
|
|
Author |
Lichao Zhang; Martin Danelljan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Multi-Modal Fusion for End-to-End RGB-T Tracking |
Type |
Conference Article |
|
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision Workshops |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
2252-2261 |
|
|
Keywords |
|
|
|
Abstract |
We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset. |
|
|
Address |
Seul; Corea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICCVW |
|
|
Notes |
LAMP; 600.109; 600.141; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ ZDG2019 |
Serial |
3279 |
|
Permanent link to this record |
|
|
|
|
Author |
Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Exploring Hate Speech Detection in Multimodal Publications |
Type |
Conference Article |
|
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research. |
|
|
Address |
Aspen; March 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GGG2020a |
Serial |
3280 |
|
Permanent link to this record |
|
|
|
|
Author |
Lu Yu; Vacit Oguz Yazici; Xialei Liu; Joost Van de Weijer; Yongmei Cheng; Arnau Ramisa |
![download PDF file pdf](img/file_PDF.gif)
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Learning Metrics from Teachers: Compact Networks for Image Embedding |
Type |
Conference Article |
|
Year |
2019 |
Publication |
32nd IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
2907-2916 |
|
|
Keywords |
|
|
|
Abstract |
Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks. Network distillation has been successfully applied to improve image classification, but has hardly been explored for metric learning. To do so, we propose two new loss functions that model the
communication of a deep teacher network to a small student network. We evaluate our system in several datasets, including CUB-200-2011, Cars-196, Stanford Online Products and show that embeddings computed using small student networks perform significantly better than those computed using standard networks of similar size. Results on a very compact network (MobileNet-0.25), which can be
used on mobile devices, show that the proposed method can greatly improve Recall@1 results from 27.5% to 44.6%. Furthermore, we investigate various aspects of distillation for embeddings, including hint and attention layers, semisupervised learning and cross quality distillation. |
|
|
Address |
Long beach; California; june 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPR |
|
|
Notes |
LAMP; 600.109; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ YYL2019 |
Serial |
3281 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol |
![goto web page url](img/www.gif)
|
|
Title |
Classificació semàntica i visual de documents digitals |
Type |
Journal |
|
Year |
2019 |
Publication |
Revista de biblioteconomia i documentacio |
Abbreviated Journal |
|
|
|
Volume ![sorted by Volume (numeric) field, ascending order (up)](img/sort_asc.gif) |
|
Issue |
|
Pages |
75-86 |
|
|
Keywords |
|
|
|
Abstract |
Se analizan los sistemas de procesamiento automático que trabajan sobre documentos digitalizados con el objetivo de describir los contenidos. De esta forma contribuyen a facilitar el acceso, permitir la indización automática y hacer accesibles los documentos a los motores de búsqueda. El objetivo de estas tecnologías es poder entrenar modelos computacionales que sean capaces de clasificar, agrupar o realizar búsquedas sobre documentos digitales. Así, se describen las tareas de clasificación, agrupamiento y búsqueda. Cuando utilizamos tecnologías de inteligencia artificial en los sistemas de
clasificación esperamos que la herramienta nos devuelva etiquetas semánticas; en sistemas de agrupamiento que nos devuelva documentos agrupados en clusters significativos; y en sistemas de búsqueda esperamos que dada una consulta, nos devuelva una lista ordenada de documentos en función de la relevancia. A continuación se da una visión de conjunto de los métodos que nos permiten describir los documentos digitales, tanto de manera visual (cuál es su apariencia), como a partir de sus contenidos semánticos (de qué hablan). En cuanto a la descripción visual de documentos se aborda el estado de la cuestión de las representaciones numéricas de documentos digitalizados
tanto por métodos clásicos como por métodos basados en el aprendizaje profundo (deep learning). Respecto de la descripción semántica de los contenidos se analizan técnicas como el reconocimiento óptico de caracteres (OCR); el cálculo de estadísticas básicas sobre la aparición de las diferentes palabras en un texto (bag-of-words model); y los métodos basados en aprendizaje profundo como el método word2vec, basado en una red neuronal que, dadas unas cuantas palabras de un texto, debe predecir cuál será la
siguiente palabra. Desde el campo de las ingenierías se están transfiriendo conocimientos que se han integrado en productos o servicios en los ámbitos de la archivística, la biblioteconomía, la documentación y las plataformas de gran consumo, sin embargo los algoritmos deben ser lo suficientemente eficientes no sólo para el reconocimiento y transcripción literal sino también para la capacidad de interpretación de los contenidos. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.084; 600.135; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Rus2019 |
Serial |
3282 |
|
Permanent link to this record |