|   | 
Details
   web
Records
Author Chris Bahnsen; David Vazquez; Antonio Lopez; Thomas B. Moeslund
Title Learning to Remove Rain in Traffic Surveillance by Using Synthetic Data Type Conference Article
Year 2019 Publication 14th International Conference on Computer Vision Theory and Applications Abbreviated Journal
Volume (down) Issue Pages 123-130
Keywords Rain Removal; Traffic Surveillance; Image Denoising
Abstract Rainfall is a problem in automated traffic surveillance. Rain streaks occlude the road users and degrade the overall visibility which in turn decrease object detection performance. One way of alleviating this is by artificially removing the rain from the images. This requires knowledge of corresponding rainy and rain-free images. Such images are often produced by overlaying synthetic rain on top of rain-free images. However, this method fails to incorporate the fact that rain fall in the entire three-dimensional volume of the scene. To overcome this, we introduce training data from the SYNTHIA virtual world that models rain streaks in the entirety of a scene. We train a conditional Generative Adversarial Network for rain removal and apply it on traffic surveillance images from SYNTHIA and the AAU RainSnow datasets. To measure the applicability of the rain-removed images in a traffic surveillance context, we run the YOLOv2 object detection algorithm on the original and rain-removed frames. The results on SYNTHIA show an 8% increase in detection accuracy compared to the original rain image. Interestingly, we find that high PSNR or SSIM scores do not imply good object detection performance.
Address Praga; Czech Republic; February 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISIGRAPP
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ BVL2019 Serial 3256
Permanent link to this record
 

 
Author Pau Rodriguez
Title Towards Robust Neural Models for Fine-Grained Image Recognition Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract Fine-grained recognition, i.e. identifying similar subcategories of the same superclass, is central to human activity. Recognizing a friend, finding bacteria in microscopic imagery, or discovering a new kind of galaxy, are just but few examples. However, fine-grained image recognition is still a challenging computer vision task since the differences between two images of the same category can overwhelm the differences between two images of different fine-grained categories. In this regime, where the difference between two categories resides on subtle input changes, excessively invariant CNNs discard those details that help to discriminate between categories and focus on more obvious changes, yielding poor classification performance.
On the other hand, CNNs with too much capacity tend to memorize instance-specific details, thus causing overfitting. In this thesis,motivated by the
potential impact of automatic fine-grained image recognition, we tackle the previous challenges and demonstrate that proper alignment of the inputs, multiple levels of attention, regularization, and explicitmodeling of the output space, results inmore accurate fine-grained recognitionmodels, that generalize better, and are more robust to intra-class variation. Concretely, we study the different stages of the neural network pipeline: input pre-processing, attention to regions, feature activations, and the label space. In each stage, we address different issues that hinder the recognition performance on various fine-grained tasks, and devise solutions in each chapter: i)We deal with the sensitivity to input alignment on fine-grained human facial motion such as pain. ii) We introduce an attention mechanism to allow CNNs to choose and process in detail the most discriminate regions of the image. iii)We further extend attention mechanisms to act on the network activations,
thus allowing them to correct their predictions by looking back at certain
regions, at different levels of abstraction. iv) We propose a regularization loss to prevent high-capacity neural networks to memorize instance details by means of almost-identical feature detectors. v)We finally study the advantages of explicitly modeling the output space within the error-correcting framework. As a result, in this thesis we demonstrate that attention and regularization seem promising directions to overcome the problems of fine-grained image recognition, as well as proper treatment of the input and the output space.
Address March 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Josep M. Gonfaus;Xavier Roca
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-3-5 Medium
Area Expedition Conference
Notes ISE; 600.119 Approved no
Call Number Admin @ si @ Rod2019 Serial 3258
Permanent link to this record
 

 
Author Xim Cerda-Company
Title Understanding color vision: from psychophysics to computational modeling Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract In this PhD we have approached the human color vision from two different points of view: psychophysics and computational modeling. First, we have evaluated 15 different tone-mapping operators (TMOs). We have conducted two experiments that
consider two different criteria: the first one evaluates the local relationships among intensity levels and the second one evaluates the global appearance of the tonemapped imagesw.r.t. the physical one (presented side by side). We conclude that the rankings depend on the criterion and they are not correlated. Considering both criteria, the best TMOs are KimKautz (Kim and Kautz, 2008) and Krawczyk (Krawczyk, Myszkowski, and Seidel, 2005). Another conclusion is that a more standardized evaluation criteria is needed to do a fair comparison among TMOs.
Secondly, we have conducted several psychophysical experiments to study the
color induction. We have studied two different properties of the visual stimuli: temporal frequency and luminance spatial distribution. To study the temporal frequency we defined equiluminant stimuli composed by both uniform and striped surrounds and we flashed them varying the flash duration. For uniform surrounds, the results show that color induction depends on both the flash duration and inducer’s chromaticity. As expected, in all chromatic conditions color contrast was induced. In contrast, for striped surrounds, we expected to induce color assimilation, but we observed color contrast or no induction. Since similar but not equiluminant striped stimuli induce color assimilation, we concluded that luminance differences could be a key factor to induce color assimilation. Thus, in a subsequent study, we have studied the luminance differences’ effect on color assimilation. We varied the luminance difference between the target region and its inducers and we observed that color assimilation depends on both this difference and the inducer’s chromaticity. For red-green condition (where the first inducer is red and the second one is green), color assimilation occurs in almost all luminance conditions.
Instead, for green-red condition, color assimilation never occurs. Purple-lime
and lime-purple chromatic conditions show that luminance difference is a key factor to induce color assimilation. When the target is darker than its surround, color assimilation is stronger in purple-lime, while when the target is brighter, color assimilation is stronger in lime-purple (’mirroring’ effect). Moreover, we evaluated whether color assimilation is due to luminance or brightness differences. Similarly to equiluminance condition, when the stimuli are equibrightness no color assimilation is induced. Our results support the hypothesis that mutual-inhibition plays a major role in color perception, or at least in color induction.
Finally, we have defined a new firing rate model of color processing in the V1
parvocellular pathway. We have modeled two different layers of this cortical area: layers 4Cb and 2/3. Our model is a recurrent dynamic computational model that considers both excitatory and inhibitory cells and their lateral connections. Moreover, it considers the existent laminar differences and the cells’ variety. Thus, we have modeled both single- and double-opponent simple cells and complex cells, which are a pool of double-opponent simple cells. A set of sinusoidal drifting gratings have been used to test the architecture. In these gratings we have varied several spatial properties such as temporal and spatial frequencies, grating’s area and orientation. To reproduce the electrophysiological observations, the architecture has to consider the existence of non-oriented double-opponent cells in layer 4Cb and the lack of lateral connections between single-opponent cells. Moreover, we have tested our lateral connections simulating the center-surround modulation and we have reproduced physiological measurements where for high contrast stimulus, the
result of the lateral connections is inhibitory, while it is facilitatory for low contrast stimulus.
Address March 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Xavier Otazu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-4-2 Medium
Area Expedition Conference
Notes NEUROBIT Approved no
Call Number Admin @ si @ Cer2019 Serial 3259
Permanent link to this record
 

 
Author Julio C. S. Jacques Junior; Cagri Ozcinar; Marina Marjanovic; Xavier Baro; Gholamreza Anbarjafari; Sergio Escalera
Title On the effect of age perception biases for real age regression Type Conference Article
Year 2019 Publication 14th IEEE International Conference on Automatic Face and Gesture Recognition Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract Automatic age estimation from facial images represents an important task in computer vision. This paper analyses the effect of gender, age, ethnic, makeup and expression attributes of faces as sources of bias to improve deep apparent age prediction. Following recent works where it is shown that apparent age labels benefit real age estimation, rather than direct real to real age regression, our main contribution is the integration, in an end-to-end architecture, of face attributes for apparent age prediction with an additional loss for real age regression. Experimental results on the APPA-REAL dataset indicate the proposed network successfully take advantage of the adopted attributes to improve both apparent and real age estimation. Our model outperformed a state-of-the-art architecture proposed to separately address apparent and real age regression. Finally, we present preliminary results and discussion of a proof of concept application using the proposed model to regress the apparent age of an individual based on the gender of an external observer.
Address Lille; France; May 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference FG
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ JOM2019 Serial 3262
Permanent link to this record
 

 
Author Bojana Gajic; Ariel Amato; Ramon Baldrich; Carlo Gatta
Title Bag of Negatives for Siamese Architectures Type Conference Article
Year 2019 Publication 30th British Machine Vision Conference Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract Training a Siamese architecture for re-identification with a large number of identities is a challenging task due to the difficulty of finding relevant negative samples efficiently. In this work we present Bag of Negatives (BoN), a method for accelerated and improved training of Siamese networks that scales well on datasets with a very large number of identities. BoN is an efficient and loss-independent method, able to select a bag of high quality negatives, based on a novel online hashing strategy.
Address Cardiff; United Kingdom; September 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference BMVC
Notes CIC; 600.140; 600.118 Approved no
Call Number Admin @ si @ GAB2019b Serial 3263
Permanent link to this record
 

 
Author Raul Gomez; Ali Furkan Biten; Lluis Gomez; Jaume Gibert; Marçal Rusiñol; Dimosthenis Karatzas
Title Selective Style Transfer for Text Type Conference Article
Year 2019 Publication 15th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume (down) Issue Pages 805-812
Keywords transfer; text style transfer; data augmentation; scene text detection
Abstract This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross-modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means
transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available.
Address Sydney; Australia; September 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.129; 600.135; 601.338; 601.310; 600.121 Approved no
Call Number GBG2019 Serial 3265
Permanent link to this record
 

 
Author Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
Title Self-Supervised Learning from Web Data for Multimodal Retrieval Type Book Chapter
Year 2019 Publication Multi-Modal Scene Understanding Book Abbreviated Journal
Volume (down) Issue Pages 279-306
Keywords self-supervised learning; webly supervised learning; text embeddings; multimodal retrieval; multimodal embedding
Abstract Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated text without supervision and analyze the semantic structure of the learnt joint image and text embeddingspace. Weperformathoroughanalysisandperformancecomparisonoffivedifferentstateof the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text basedimageretrievaltask,andweclearlyoutperformstateoftheartintheMIRFlickrdatasetwhen training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.129; 601.338; 601.310 Approved no
Call Number Admin @ si @ GGG2019 Serial 3266
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Patricia Suarez; Angel Sappa; Boris X. Vintimilla
Title Thermal Image SuperResolution Through Deep Convolutional Neural Network Type Conference Article
Year 2019 Publication 16th International Conference on Images Analysis and Recognition Abbreviated Journal
Volume (down) Issue Pages 417-426
Keywords
Abstract Due to the lack of thermal image datasets, a new dataset has been acquired for proposed a super-resolution approach using a Deep Convolution Neural Network schema. In order to achieve this image enhancement process, a new thermal images dataset is used. Different experiments have been carried out, firstly, the proposed architecture has been trained using only images of the visible spectrum, and later it has been trained with images of the thermal spectrum, the results showed that with the network trained with thermal images, better results are obtained in the process of enhancing the images, maintaining the image details and perspective. The thermal dataset is available at http://www.
cidis.espol.edu.ec/es/dataset.
Address Waterloo; Canada; August 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICIAR
Notes MSIAU; 600.130; 601.349; 600.122 Approved no
Call Number Admin @ si @ RSS2019 Serial 3269
Permanent link to this record
 

 
Author Angel Morera; Angel Sanchez; Angel Sappa; Jose F. Velez
Title Robust Detection of Outdoor Urban Advertising Panels in Static Images Type Conference Article
Year 2019 Publication 18th International Conference on Practical Applications of Agents and Multi-Agent Systems Abbreviated Journal
Volume (down) Issue Pages 246-256
Keywords Object detection; Urban ads panels; Deep learning; Single Shot Detector (SSD) architecture; Intersection over Union (IoU) metric; Augmented Reality
Abstract One interesting publicity application for Smart City environments is recognizing brand information contained in urban advertising panels. For such a purpose, a previous stage is to accurately detect and locate the position of these panels in images. This work presents an effective solution to this problem using a Single Shot Detector (SSD) based on a deep neural network architecture that minimizes the number of false detections under multiple variable conditions regarding the panels and the scene. Achieved experimental results using the Intersection over Union (IoU) accuracy metric make this proposal applicable in real complex urban images.
Address Aquila; Italia; June 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference PAAMS
Notes MSIAU; 600.130; 600.122 Approved no
Call Number Admin @ si @ MSS2019 Serial 3270
Permanent link to this record
 

 
Author Armin Mehri; Angel Sappa
Title Colorizing Near Infrared Images through a Cyclic Adversarial Approach of Unpaired Samples Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision and Pattern Recognition-Workshops Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract This paper presents a novel approach for colorizing near infrared (NIR) images. The approach is based on image-to-image translation using a Cycle-Consistent adversarial network for learning the color channels on unpaired dataset. This architecture is able to handle unpaired datasets. The approach uses as generators tailored networks that require less computation times, converge faster and generate high quality samples. The obtained results have been quantitatively—using standard evaluation metrics—and qualitatively evaluated showing considerable improvements with respect to the state of the art
Address Long beach; California; USA; June 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes MSIAU; 600.130; 601.349; 600.122 Approved no
Call Number Admin @ si @ MeS2019 Serial 3271
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud
Title Image Vegetation Index through a Cycle Generative Adversarial Network Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision and Pattern Recognition-Workshops Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract This paper proposes a novel approach to estimate the Normalized Difference Vegetation Index (NDVI) just from an RGB image. The NDVI values are obtained by using images from the visible spectral band together with a synthetic near infrared image obtained by a cycled GAN. The cycled GAN network is able to obtain a NIR image from a given gray scale image. It is trained by using unpaired set of gray scale and NIR images by using a U-net architecture and a multiple loss function (gray scale images are obtained from the provided RGB images). Then, the NIR image estimated with the proposed cycle generative adversarial network is used to compute the NDVI index. Experimental results are provided showing the validity of the proposed approach. Additionally, comparisons with previous approaches are also provided.
Address Long beach; California; USA; June 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes MSIAU; 600.130; 601.349; 600.122 Approved no
Call Number Admin @ si @ SSV2019 Serial 3272
Permanent link to this record
 

 
Author Arnau Baro; Jialuo Chen; Alicia Fornes; Beata Megyesi
Title Towards a generic unsupervised method for transcription of encoded manuscripts Type Conference Article
Year 2019 Publication 3rd International Conference on Digital Access to Textual Cultural Heritage Abbreviated Journal
Volume (down) Issue Pages 73-78
Keywords A. Baró, J. Chen, A. Fornés, B. Megyesi.
Abstract Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an un-supervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods.
Address Brussels; May 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DATeCH
Notes DAG; 600.097; 600.140; 600.121 Approved no
Call Number Admin @ si @ BCF2019 Serial 3276
Permanent link to this record
 

 
Author Lei Kang; Marçal Rusiñol; Alicia Fornes; Pau Riba; Mauricio Villegas
Title Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition Type Conference Article
Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume (down) Issue Pages
Keywords
Abstract Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.
Address Aspen; Colorado; USA; March 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.129; 600.140; 601.302; 601.312; 600.121 Approved no
Call Number Admin @ si @ KRF2020 Serial 3446
Permanent link to this record
 

 
Author Lichao Zhang; Abel Gonzalez-Garcia; Joost Van de Weijer; Martin Danelljan; Fahad Shahbaz Khan
Title Learning the Model Update for Siamese Trackers Type Conference Article
Year 2019 Publication 18th IEEE International Conference on Computer Vision Abbreviated Journal
Volume (down) Issue Pages 4009-4018
Keywords
Abstract Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score.
Address Seul; Corea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV
Notes LAMP; 600.109; 600.141; 600.120 Approved no
Call Number Admin @ si @ ZGW2019 Serial 3295
Permanent link to this record
 

 
Author Lichao Zhang; Martin Danelljan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan
Title Multi-Modal Fusion for End-to-End RGB-T Tracking Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision Workshops Abbreviated Journal
Volume (down) Issue Pages 2252-2261
Keywords
Abstract We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.
Address Seul; Corea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes LAMP; 600.109; 600.141; 600.120 Approved no
Call Number Admin @ si @ ZDG2019 Serial 3279
Permanent link to this record