Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >>

Details

Records
Author	Dustin Carrion Ojeda; Hong Chen; Adrian El Baz; Sergio Escalera; Chaoyu Guan; Isabelle Guyon; Ihsan Ullah; Xin Wang; Wenwu Zhu
Title	NeurIPS’22 Cross-Domain MetaDL competition: Design and baseline results			Type	Conference Article
Year	2022	Publication	Understanding Social Behavior in Dyadic and Small Group Interactions	Abbreviated Journal
Volume	191	Issue		Pages	24-37
Keywords
Abstract	We present the design and baseline results for a new challenge in the ChaLearn meta-learning series, accepted at NeurIPS'22, focusing on “cross-domain” meta-learning. Meta-learning aims to leverage experience gained from previous tasks to solve new tasks efficiently (i.e., with better performance, little training data, and/or modest computational resources). While previous challenges in the series focused on within-domain few-shot learning problems, with the aim of learning efficiently N-way k-shot tasks (i.e., N class classification problems with k training examples), this competition challenges the participants to solve “any-way” and “any-shot” problems drawn from various domains (healthcare, ecology, biology, manufacturing, and others), chosen for their humanitarian and societal impact. To that end, we created Meta-Album, a meta-dataset of 40 image classification datasets from 10 domains, from which we carve out tasks with any number of “ways” (within the range 2-20) and any number of “shots” (within the range 1-20). The competition is with code submission, fully blind-tested on the CodaLab challenge platform. The code of the winners will be open-sourced, enabling the deployment of automated machine learning solutions for few-shot image classification across several domains.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	PMLR
Notes	HUPBA; no menciona			Approved	no
Call Number	Admin @ si @ CCB2022			Serial	3802
Permanent link to this record



Author	Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title	Cross-Spectral Image Processing			Type	Book Chapter
Year	2022	Publication	Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	23-34
Keywords
Abstract	Although this book is on IR computer vision and its main focus lies on IR image and video processing and analysis, a special attention is dedicated to cross-spectral image processing due to the increasing number of publications and applications in this domain. In these cross-spectral frameworks, IR information is used together with information from other spectral bands to tackle some specific problems by developing more robust solutions. Tasks considered for cross-spectral processing are for instance dehazing, segmentation, vegetation index estimation, or face recognition. This increasing number of applications is motivated by cross- and multi-spectral camera setups available already on the market like for example smartphones, remote sensing multispectral cameras, or multi-spectral cameras for automotive systems or drones. In this chapter, different cross-spectral image processing techniques will be reviewed together with possible applications. Initially, image registration approaches for the cross-spectral case are reviewed: the registration stage is the first image processing task, which is needed to align images acquired by different sensors within the same reference coordinate system. Then, recent cross-spectral image colorization approaches, which are intended to colorize infrared images for different applications are presented. Finally, the cross-spectral image enhancement problem is tackled by including guided super resolution techniques, image dehazing approaches, cross-spectral filtering and edge detection. Figure 3.1 illustrates cross-spectral image processing stages as well as their possible connections. Table 3.1 presents some of the available public cross-spectral datasets generally used as reference data to evaluate cross-spectral image registration, colorization, enhancement, or exploitation results.
Address
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	SLCV
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-00698-2	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ TSH2022b			Serial	3805
Permanent link to this record



Author	Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title	Detection, Classification, and Tracking			Type	Book Chapter
Year	2022	Publication	Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	35-58
Keywords
Abstract	Automatic image and video exploitation or content analysis is a technique to extract higher-level information from a scene such as objects, behavior, (inter-)actions, environment, or even weather conditions. The relevant information is assumed to be contained in the two-dimensional signal provided in an image (width and height in pixels) or the three-dimensional signal provided in a video (width, height, and time). But also intermediate-level information such as object classes [196], locations [197], or motion [198] can help applications to fulfill certain tasks such as intelligent compression [199], video summarization [200], or video retrieval [201]. Usually, videos with their temporal dimension are a richer source of data compared to single images [202] and thus certain video content can be extracted from videos only such as object motion or object behavior. Often, machine learning or nowadays deep learning techniques are utilized to model prior knowledge about object or scene appearance using labeled training samples [203, 204]. After a learning phase, these models are then applied in real world applications, which is called inference.
Address
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	SLCV
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-00698-2	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ TSH2022c			Serial	3806
Permanent link to this record



Author	Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title	Image and Video Enhancement			Type	Book Chapter
Year	2022	Publication	Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	9-21
Keywords
Abstract	Image and video enhancement aims at improving the signal quality relative to imaging artifacts such as noise and blur or atmospheric perturbations such as turbulence and haze. It is usually performed in order to assist humans in analyzing image and video content or simply to present humans visually appealing images and videos. However, image and video enhancement can also be used as a preprocessing technique to ease the task and thus improve the performance of subsequent automatic image content analysis algorithms: preceding dehazing can improve object detection as shown by [23] or explicit turbulence modeling can improve moving object detection as discussed by [24]. But it remains an open question whether image and video enhancement should rather be performed explicitly as a preprocessing step or implicitly for example by feeding affected images directly to a neural network for image content analysis like object detection [25]. Especially for real-time video processing at low latency it can be better to handle image perturbation implicitly in order to minimize the processing time of an algorithm. This can be achieved by making algorithms for image content analysis robust or even invariant to perturbations such as noise or blur. Additionally, mistakes of an individual preprocessing module can obviously affect the quality of the entire processing pipeline.
Address
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	SLCV
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ TSH2022a			Serial	3807
Permanent link to this record



Author	Alex Falcon; Swathikiran Sudhakaran; Giuseppe Serra; Sergio Escalera; Oswald Lanz
Title	Relevance-based Margin for Contrastively-trained Video Retrieval Models			Type	Conference Article
Year	2022	Publication	ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval	Abbreviated Journal
Volume		Issue		Pages	146-157
Keywords
Abstract	Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space by putting similar items close and dissimilar items far. This framework leads to competitive recall rates, as they solely focus on the rank of the groundtruth items. Yet, assessing the quality of the ranking list is of utmost importance when considering intelligent retrieval systems, since multiple items may share similar semantics, hence a high relevance. Moreover, the aforementioned framework uses a fixed margin to separate similar and dissimilar items, treating all non-groundtruth items as equally irrelevant. In this paper we propose to use a variable margin: we argue that varying the margin used during training based on how much relevant an item is to a given query, i.e. a relevance-based margin, easily improves the quality of the ranking lists measured through nDCG and mAP. We demonstrate the advantages of our technique using different models on EPIC-Kitchens-100 and YouCook2. We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance. Finally, extensive ablation studies and qualitative analysis support the robustness of our approach. Code will be released at \urlhttps://github.com/aranciokov/RelevanceMargin-ICMR22.
Address	Newwark, NJ, USA, 27 June 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICMR
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ FSS2022			Serial	3808
Permanent link to this record



Author	Utkarsh Porwal; Alicia Fornes; Faisal Shafait (eds)
Title	Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition. 18th International Conference, ICFHR 2022			Type	Book Whole
Year	2022	Publication	Frontiers in Handwriting Recognition.	Abbreviated Journal
Volume	13639	Issue		Pages
Keywords
Abstract
Address	ICFHR 2022, Hyderabad, India, December 4–7, 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor	Utkarsh Porwal; Alicia Fornes; Faisal Shafait
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-21648-0	Medium
Area		Expedition		Conference	ICFHR
Notes	DAG			Approved	no
Call Number	Admin @ si @ PFS2022			Serial	3809
Permanent link to this record



Author	Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca
Title	Human Body Pose Estimation in Multi-view Environments			Type	Book Chapter
Year	2022	Publication	ICT Applications for Smart Cities. Intelligent Systems Reference Library	Abbreviated Journal
Volume	224	Issue		Pages	79-99
Keywords
Abstract	This chapter tackles the challenging problem of human pose estimation in multi-view environments to handle scenes with self-occlusions. The proposed approach starts by first estimating the camera pose—extrinsic parameters—in multi-view scenarios; due to few real image datasets, different virtual scenes are generated by using a special simulator, for training and testing the proposed convolutional neural network based approaches. Then, these extrinsic parameters are used to establish the relation between different cameras into the multi-view scheme, which captures the pose of the person from different points of view at the same time. The proposed multi-view scheme allows to robustly estimate human body joints’ position even in situations where they are occluded. This would help to avoid possible false alarms in behavioral analysis systems of smart cities, as well as applications for physical therapy, safe moving assistance for the elderly among other. The chapter concludes by presenting experimental results in real scenes by using state-of-the-art and the proposed multi-view approaches.
Address	September 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	ISRL
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06306-0	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ CSV2022b			Serial	3810
Permanent link to this record



Author	Henry Velesaca; Patricia Suarez; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez; Angel Morera
Title	Video Analytics in Urban Environments: Challenges and Approaches			Type	Book Chapter
Year	2022	Publication	ICT Applications for Smart Cities	Abbreviated Journal
Volume	224	Issue		Pages	101-121
Keywords
Abstract	This chapter reviews state-of-the-art approaches generally present in the pipeline of video analytics on urban scenarios. A typical pipeline is used to cluster approaches in the literature, including image preprocessing, object detection, object classification, and object tracking modules. Then, a review of recent approaches for each module is given. Additionally, applications and datasets generally used for training and evaluating the performance of these approaches are included. This chapter does not pretend to be an exhaustive review of state-of-the-art video analytics in urban environments but rather an illustration of some of the different recent contributions. The chapter concludes by presenting current trends in video analytics in the urban scenario field.
Address	September 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	ISRL
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06306-0	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ VSC2022			Serial	3811
Permanent link to this record



Author	Victoria Ruiz; Angel Sanchez; Jose F. Velez; Bogdan Raducanu
Title	Waste Classification with Small Datasets and Limited Resources			Type	Book Chapter
Year	2022	Publication	ICT Applications for Smart Cities. Intelligent Systems Reference Library	Abbreviated Journal
Volume	224	Issue		Pages	185-203
Keywords
Abstract	Automatic waste recycling has become a very important societal challenge nowadays, raising people’s awareness for a cleaner environment and a more sustainable lifestyle. With the transition to Smart Cities, and thanks to advanced ICT solutions, this problem has received a new impulse. The waste recycling focus has shifted from general waste treating facilities to an individual responsibility, where each person should become aware of selective waste separation. The surge of the mobile devices, accompanied by a significant increase in computation power, has potentiated and facilitated this individual role. An automated image-based waste classification mechanism can help with a more efficient recycling and a reduction of contamination from residuals. Despite the good results achieved with the deep learning methodologies for this task, the Achille’s heel is that they require large neural networks which need significant computational resources for training and therefore are not suitable for mobile devices. To circumvent this apparently intractable problem, we will rely on knowledge distillation in order to transfer the network’s knowledge from a larger network (called ‘teacher’) to a smaller, more compact one, (referred as ‘student’) and thus making it possible the task of image classification on a device with limited resources. For evaluation, we considered as ‘teachers’ large architectures such as InceptionResNet or DenseNet and as ‘students’, several configurations of the MobileNets. We used the publicly available TrashNet dataset to demonstrate that the distillation process does not significantly affect system’s performance (e.g. classification accuracy) of the student network.
Address	September 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	ISRL
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06306-0	Medium
Area		Expedition		Conference
Notes	LAMP			Approved	no
Call Number	Admin @ si @			Serial	3813
Permanent link to this record



Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
Title	Local Prediction Aggregation: A Frustratingly Easy Source-free Domain Adaptation Method			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in this https URL.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ YWW2022b			Serial	3815
Permanent link to this record



Author	Ali Furkan Biten; Ruben Tito; Lluis Gomez; Ernest Valveny; Dimosthenis Karatzas
Title	OCR-IDL: OCR Annotations for Industry Document Library Dataset			Type	Conference Article
Year	2022	Publication	ECCV Workshop on Text in Everything	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in this https URL.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECCV
Notes	DAG; no proj			Approved	no
Call Number	Admin @ si @ BTG2022			Serial	3817
Permanent link to this record



Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
Title	One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper, we investigate model adaptation under domain and category shift, where the final goal is to achieve (SF-UNDA), which addresses the situation where there exist both domain and category shifts between source and target domains. Under the SF-UNDA setting, the model cannot access source data anymore during target adaptation, which aims to address data privacy concerns. We propose a novel training scheme to learn a ( +1)-way classifier to predict the source classes and the unknown class, where samples of only known source categories are available for training. Furthermore, for target adaptation, we simply adopt a weighted entropy minimization to adapt the source pretrained model to the unlabeled target domain without source data. In experiments, we show: After source training, the resulting source model can get excellent performance for ; After target adaptation, our method surpasses current UNDA approaches which demand source data during adaptation. The versatility to several different tasks strongly proves the efficacy and generalization ability of our method. When augmented with a closed-set domain adaptation approach during target adaptation, our source-free method further outperforms the current state-of-the-art UNDA method by 2.5%, 7.2% and 13% on Office-31, Office-Home and VisDA respectively.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; no proj			Approved	no
Call Number	Admin @ si @ YWW2022c			Serial	3818
Permanent link to this record



Author	Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang
Title	PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MACO; no proj			Approved	no
Call Number	Admin @ si @ ZHM2022b			Serial	3819
Permanent link to this record



Author	Ruben Ballester; Xavier Arnal Clemente; Carles Casacuberta; Meysam Madadi; Ciprian Corneanu
Title	Towards explaining the generalization gap in neural networks using topological data analysis			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Understanding how neural networks generalize on unseen data is crucial for designing more robust and reliable models. In this paper, we study the generalization gap of neural networks using methods from topological data analysis. For this purpose, we compute homological persistence diagrams of weighted graphs constructed from neuron activation correlations after a training phase, aiming to capture patterns that are linked to the generalization capacity of the network. We compare the usefulness of different numerical summaries from persistence diagrams and show that a combination of some of them can accurately predict and partially explain the generalization gap without the need of a test set. Evaluation on two computer vision recognition tasks (CIFAR10 and SVHN) shows competitive generalization gap prediction when compared against state-of-the-art methods.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no menciona			Approved	no
Call Number	Admin @ si @ BAC2022			Serial	3821
Permanent link to this record



Author	Arya Farkhondeh; Cristina Palmero; Simone Scardapane; Sergio Escalera
Title	Towards Self-Supervised Gaze Estimation			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Recent joint embedding-based self-supervised methods have surpassed standard supervised approaches on various image recognition tasks such as image classification. These self-supervised methods aim at maximizing agreement between features extracted from two differently transformed views of the same image, which results in learning an invariant representation with respect to appearance and geometric image transformations. However, the effectiveness of these approaches remains unclear in the context of gaze estimation, a structured regression task that requires equivariance under geometric transformations (e.g., rotations, horizontal flip). In this work, we propose SwAT, an equivariant version of the online clustering-based self-supervised approach SwAV, to learn more informative representations for gaze estimation. We demonstrate that SwAT, with ResNet-50 and supported with uncurated unlabeled face images, outperforms state-of-the-art gaze estimation methods and supervised baselines in various experiments. In particular, we achieve up to 57% and 25% improvements in cross-dataset and within-dataset evaluation tasks on existing benchmarks (ETH-XGaze, Gaze360, and MPIIFaceGaze).
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no menciona			Approved	no
Call Number	Admin @ si @ FPS2022			Serial	3822
Permanent link to this record