Publicacions CVC -- Query Results

[171–180] << 181 182 183 184 185 186 187 188 189 190 >> [191–200]

Details

Records
Author	Lichao Zhang
Title	Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos			Type	Book Whole
Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization.
Address	November 2019
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-1210011-1-9	Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ Zha2019			Serial	3393
Permanent link to this record



Author	Lu Yu
Title	Semantic Representation: From Color to Deep Embeddings			Type	Book Whole
Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings. In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists. In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
Address	November 2019
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Yongmei Cheng
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-121011-3-3	Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ Yu2019			Serial	3394
Permanent link to this record



Author	Albert Berenguel
Title	Analysis of background textures in banknotes and identity documents for counterfeit detection			Type	Book Whole
Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
Address	November 2019
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Oriol Ramos Terrades;Josep Llados
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-121011-2-6	Medium
Area		Expedition		Conference
Notes	DAG; 600.140; 600.121			Approved	no
Call Number	Admin @ si @ Ber2019			Serial	3395
Permanent link to this record



Author	Xialei Liu
Title	Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains			Type	Book Whole
Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks. Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting. In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
Address	December 2019
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Andrew Bagdanov
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-121011-4-0	Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ Liu2019			Serial	3396
Permanent link to this record



Author	Yaxing Wang
Title	Transferring and Learning Representations for Image Generation and Translation			Type	Book Whole
Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation.
Address	January 2020
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Abel Gonzalez;Luis Herranz
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-121011-5-7	Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ Wan2020			Serial	3397
Permanent link to this record



Author	Manisha Das; Deep Gupta; Petia Radeva; Ashwini M. Bakde
Title	Optimized CT-MR neurological image fusion framework using biologically inspired spiking neural model in hybrid ℓ1 - ℓ0 layer decomposition domain			Type	Journal Article
Year	2021	Publication	Biomedical Signal Processing and Control	Abbreviated Journal	BSPC
Volume	68	Issue		Pages	102535
Keywords
Abstract	Medical image fusion plays an important role in the clinical diagnosis of several critical neurological diseases by merging complementary information available in multimodal images. In this paper, a novel CT-MR neurological image fusion framework is proposed using an optimized biologically inspired feedforward neural model in two-scale hybrid ℓ1 − ℓ0 decomposition domain using gray wolf optimization to preserve the structural as well as texture information present in source CT and MR images. Initially, the source images are subjected to two-scale ℓ1 − ℓ0 decomposition with optimized parameters, giving a scale-1 detail layer, a scale-2 detail layer and a scale-2 base layer. Two detail layers at scale-1 and 2 are fused using an optimized biologically inspired neural model and weighted average scheme based on local energy and modified spatial frequency to maximize the preservation of edges and local textures, respectively, while the scale-2 base layer gets fused using choose max rule to preserve the background information. To optimize the hyper-parameters of hybrid ℓ1 − ℓ0 decomposition and biologically inspired neural model, a fitness function is evaluated based on spatial frequency and edge index of the resultant fused image obtained by adding all the fused components. The fusion performance is analyzed by conducting extensive experiments on different CT-MR neurological images. Experimental results indicate that the proposed method provides better-fused images and outperforms the other state-of-the-art fusion methods in both visual and quantitative assessments.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB; no proj			Approved	no
Call Number	Admin @ si @ DGR2021b			Serial	3636
Permanent link to this record



Author	Sergio Escalera; Stephane Ayache; Jun Wan; Meysam Madadi; Umut Guçlu; Xavier Baro
Title	Inpainting and Denoising Challenges			Type	Book Whole
Year	2019	Publication	The Springer Series on Challenges in Machine Learning	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The problem of dealing with missing or incomplete data in machine learning and computer vision arises in many applications. Recent strategies make use of generative models to impute missing or corrupted data. Advances in computer vision using deep generative models have found applications in image/video processing, such as denoising, restoration, super-resolution, or inpainting. Inpainting and Denoising Challenges comprises recent efforts dealing with image and video inpainting tasks. This includes winning solutions to the ChaLearn Looking at People inpainting and denoising challenges: human pose recovery, video de-captioning and fingerprint restoration. This volume starts with a wide review on image denoising, retracing and comparing various methods from the pioneer signal processing methods, to machine learning approaches with sparse and low-rank models, and recent deep learning architectures with autoencoders and variants. The following chapters present results from the Challenge, including three competition tasks at WCCI and ECML 2018. The top best approaches submitted by participants are described, showing interesting contributions and innovating methods. The last two chapters propose novel contributions and highlight new applications that benefit from image/video inpainting.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no menciona			Approved	no
Call Number	Admin @ si @ EAW2019			Serial	3398
Permanent link to this record



Author	Hugo Jair Escalante; Sergio Escalera; Isabelle Guyon; Xavier Baro; Yagmur Gucluturk; Umut Guçlu; Marcel van Gerven
Title	Explainable and Interpretable Models in Computer Vision and Machine Learning			Type	Book Whole
Year	2018	Publication	The Springer Series on Challenges in Machine Learning	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	This book compiles leading research on the development of explainable and interpretable machine learning methods in the context of computer vision and machine learning. Research progress in computer vision and pattern recognition has led to a variety of modeling techniques with almost human-like performance. Although these models have obtained astounding results, they are limited in their explainability and interpretability: what is the rationale behind the decision made? what in the model structure explains its functioning? Hence, while good performance is a critical required characteristic for learning machines, explainability and interpretability capabilities are needed to take learning machines to the next step to include them in decision support systems involving human supervision. This book, written by leading international researchers, addresses key topics of explainability and interpretability, including the following: ·Evaluation and Generalization in Interpretable Machine Learning ·Explanation Methods in Deep Learning ·Learning Functional Causal Models with Generative Neural Networks ·Learning Interpreatable Rules for Multi-Label Classification ·Structuring Neural Networks for More Explainable Predictions ·Generating Post Hoc Rationales of Deep Visual Classification Decisions ·Ensembling Visual Explanations ·Explainable Deep Driving by Visualizing Causal Attention ·Interdisciplinary Perspective on Algorithmic Job Candidate Search ·Multimodal Personality Trait Analysis for Explainable Modeling of Job Interview Decisions ·Inherent Explainability Pattern Theory-based Video Event Interpretations
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ EEG2018			Serial	3399
Permanent link to this record



Author	Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
Title	RoadText-1K: Text Detection and Recognition Dataset for Driving Videos			Type	Conference Article
Year	2020	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/ projects/cvit-projects/roadtext-1k
Address	Paris; Francia; ???
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICRA
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ RMG2020			Serial	3400
Permanent link to this record



Author	Idoia Ruiz; Bogdan Raducanu; Rakesh Mehta; Jaume Amores
Title	Optimizing speed/accuracy trade-off for person re-identification via knowledge distillation			Type	Journal Article
Year	2020	Publication	Engineering Applications of Artificial Intelligence	Abbreviated Journal	EAAI
Volume	87	Issue		Pages	103309
Keywords	Person re-identification; Network distillation; Image retrieval; Model compression; Surveillance
Abstract	Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO and XQDA, to deep learning based techniques, using image classification networks, ResNet and MobileNets. Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time. We evaluate both methods on the Market-1501 and DukeMTMC-reID large-scale datasets, showing that distillation helps reducing the computational cost at inference time while even increasing the accuracy performance.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.109; 600.120			Approved	no
Call Number	Admin @ si @ RRM2020			Serial	3401
Permanent link to this record



Author	Lorenzo Porzi; Markus Hofinger; Idoia Ruiz; Joan Serrat; Samuel Rota Bulo; Peter Kontschieder
Title	Learning Multi-Object Tracking and Segmentation from Automatic Annotations			Type	Conference Article
Year	2020	Publication	33rd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	6845-6854
Keywords
Abstract	In this work we contribute a novel pipeline to automatically generate training data, and to improve over state-of-the-art multi-object tracking and segmentation (MOTS) methods. Our proposed track mining algorithm turns raw street-level videos into high-fidelity MOTS training data, is scalable and overcomes the need of expensive and time-consuming manual annotation approaches. We leverage state-of-the-art instance segmentation results in combination with optical flow predictions, also trained on automatically harvested training data. Our second major contribution is MOTSNet – a deep learning, tracking-by-detection architecture for MOTS – deploying a novel mask-pooling layer for improved object association over time. Training MOTSNet with our automatically extracted data leads to significantly improved sMOTSA scores on the novel KITTI MOTS dataset (+1.9%/+7.5% on cars/pedestrians), and MOTSNet improves by +4.1% over previously best methods on the MOTSChallenge dataset. Our most impressive finding is that we can improve over previous best-performing works, even in complete absence of manually annotated MOTS training data.
Address	virtual; June 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPR
Notes	ADAS; 600.124; 600.118			Approved	no
Call Number	Admin @ si @ PHR2020			Serial	3402
Permanent link to this record



Author	Ana Garcia Rodriguez; Jorge Bernal; F. Javier Sanchez; Henry Cordova; Rodrigo Garces Duran; Cristina Rodriguez de Miguel; Gloria Fernandez Esparrach
Title	Polyp fingerprint: automatic recognition of colorectal polyps’ unique features			Type	Journal Article
Year	2020	Publication	Surgical Endoscopy and other Interventional Techniques	Abbreviated Journal	SEND
Volume	34	Issue	4	Pages	1887-1889
Keywords
Abstract	BACKGROUND: Content-based image retrieval (CBIR) is an application of machine learning used to retrieve images by similarity on the basis of features. Our objective was to develop a CBIR system that could identify images containing the same polyp ('polyp fingerprint'). METHODS: A machine learning technique called Bag of Words was used to describe each endoscopic image containing a polyp in a unique way. The system was tested with 243 white light images belonging to 99 different polyps (for each polyp there were at least two images representing it in two different temporal moments). Images were acquired in routine colonoscopies at Hospital Clínic using high-definition Olympus endoscopes. The method provided for each image the closest match within the dataset. RESULTS: The system matched another image of the same polyp in 221/243 cases (91%). No differences were observed in the number of correct matches according to Paris classification (protruded: 90.7% vs. non-protruded: 91.3%) and size (< 10 mm: 91.6% vs. > 10 mm: 90%). CONCLUSIONS: A CBIR system can match accurately two images containing the same polyp, which could be a helpful aid for polyp image recognition. KEYWORDS: Artificial intelligence; Colorectal polyps; Content-based image retrieval
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MV; no menciona			Approved	no
Call Number	Admin @ si @			Serial	3403
Permanent link to this record



Author	Cristina Sanchez Montes; Jorge Bernal; Ana Garcia Rodriguez; Henry Cordova; Gloria Fernandez Esparrach
Title	Revisión de métodos computacionales de detección y clasificación de pólipos en imagen de colonoscopia			Type	Journal Article
Year	2020	Publication	Gastroenterología y Hepatología	Abbreviated Journal	GH
Volume	43	Issue	4	Pages	222-232
Keywords
Abstract	Computer-aided diagnosis (CAD) is a tool with great potential to help endoscopists in the tasks of detecting and histologically classifying colorectal polyps. In recent years, different technologies have been described and their potential utility has been increasingly evidenced, which has generated great expectations among scientific societies. However, most of these works are retrospective and use images of different quality and characteristics which are analysed off line. This review aims to familiarise gastroenterologists with computational methods and the particularities of endoscopic imaging, which have an impact on image processing analysis. Finally, the publicly available image databases, needed to compare and confirm the results obtained with different methods, are presented.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MV;			Approved	no
Call Number	Admin @ si @ SBG2020			Serial	3404
Permanent link to this record



Author	Gabriel Villalonga; Joost Van de Weijer; Antonio Lopez
Title	Recognizing new classes with synthetic data in the loop: application to traffic sign recognition			Type	Journal Article
Year	2020	Publication	Sensors	Abbreviated Journal	SENS
Volume	20	Issue	3	Pages	583
Keywords
Abstract	On-board vision systems may need to increase the number of classes that can be recognized in a relatively short period. For instance, a traffic sign recognition system may suddenly be required to recognize new signs. Since collecting and annotating samples of such new classes may need more time than we wish, especially for uncommon signs, we propose a method to generate these samples by combining synthetic images and Generative Adversarial Network (GAN) technology. In particular, the GAN is trained on synthetic and real-world samples from known classes to perform synthetic-to-real domain adaptation, but applied to synthetic samples of the new classes. Using the Tsinghua dataset with a synthetic counterpart, SYNTHIA-TS, we have run an extensive set of experiments. The results show that the proposed method is indeed effective, provided that we use a proper Convolutional Neural Network (CNN) to perform the traffic sign recognition (classification) task as well as a proper GAN to transform the synthetic images. Here, a ResNet101-based classifier and domain adaptation based on CycleGAN performed extremely well for a ratio∼ 1/4 for new/known classes; even for more challenging ratios such as∼ 4/1, the results are also very positive.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; ADAS; 600.118; 600.120			Approved	no
Call Number	Admin @ si @ VWL2020			Serial	3405
Permanent link to this record



Author	Hugo Jair Escalante; Heysem Kaya; Albert Ali Salah; Sergio Escalera; Yagmur Gucluturk; Umut Guçlu; Xavier Baro; Isabelle Guyon; Julio C. S. Jacques Junior; Meysam Madadi; Stephane Ayache; Evelyne Viegas; Furkan Gurpinar; Achmadnoer Sukma Wicaksana; Cynthia Liem; Marcel A. J. Van Gerven; Rob Van Lier
Title	Modeling, Recognizing, and Explaining Apparent Personality from Videos			Type	Journal Article
Year	2022	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
Volume	13	Issue	2	Pages	894-911
Keywords
Abstract	Explainability and interpretability are two critical aspects of decision support systems. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of apparent personality recognition. To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail. Finally, derived from our study, we outline research opportunities that we foresee will be relevant in this area in the near future.
Address	1 April-June 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ EKS2022			Serial	3406
Permanent link to this record