|   | 
Details
   web
Records
Author Senmao Li; Joost Van de Weijer; Yaxing Wang; Fahad Shahbaz Khan; Meiqin Liu; Jian Yang
Title (up) 3D-Aware Multi-Class Image-to-Image Translation with NeRFs Type Conference Article
Year 2023 Publication 36th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 12652-12662
Keywords
Abstract Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware 121 translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In exten-sive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware 121 translation with multi-view consistency. Code is available in 3DI2I.
Address Vancouver; Canada; June 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes LAMP Approved no
Call Number Admin @ si @ LWW2023b Serial 3920
Permanent link to this record
 

 
Author Albin Soutif; Antonio Carta; Andrea Cossu; Julio Hurtado; Hamed Hemati; Vincenzo Lomonaco; Joost Van de Weijer
Title (up) A Comprehensive Empirical Evaluation on Online Continual Learning Type Conference Article
Year 2023 Publication Visual Continual Learning (ICCV-W) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at this https URL based on the avalanche framework to reproduce our results and encourage future research.
Address Paris; France; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes LAMP Approved no
Call Number Admin @ si @ SCC2023 Serial 3938
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
Title (up) A deep co-attentive hand-based video question answering framework using multi-view skeleton Type Journal Article
Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 82 Issue Pages 1401–1429
Keywords
Abstract In this paper, we present a novel hand –based Video Question Answering framework, entitled Multi-View Video Question Answering (MV-VQA), employing the Single Shot Detector (SSD), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT), and Co-Attention mechanism with RGB videos as the inputs. Our model includes three main blocks: vision, language, and attention. In the vision block, we employ a novel representation to obtain some efficient multiview features from the hand object using the combination of five 3DCNNs and one LSTM network. To obtain the question embedding, we use the BERT model in language block. Finally, we employ a co-attention mechanism on vision and language features to recognize the final answer. For the first time, we propose such a hand-based Video-QA framework including the multi-view hand skeleton features combined with the question embedding and co-attention mechanism. Our framework is capable of processing the arbitrary numbers of questions in the dataset annotations. There are different application domains for this framework. Here, as an application domain, we applied our framework to dynamic hand gesture recognition for the first time. Since the main object in dynamic hand gesture recognition is the human hand, we performed a step-by-step analysis of the hand detection and multi-view hand skeleton impact on the model performance. Evaluation results on five datasets, including two datasets in VideoQA, two datasets in dynamic hand gesture, and one dataset in hand action recognition show that MV-VQA outperforms state-of-the-art alternatives.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ RKE2023b Serial 3881
Permanent link to this record
 

 
Author Jaykishan Patel; Alban Flachot; Javier Vazquez; David H. Brainard; Thomas S. A. Wallis; Marcus A. Brubaker; Richard F. Murray
Title (up) A deep convolutional neural network trained to infer surface reflectance is deceived by mid-level lightness illusions Type Journal Article
Year 2023 Publication Journal of Vision Abbreviated Journal JV
Volume 23 Issue 9 Pages 4817-4817
Keywords
Abstract A long-standing view is that lightness illusions are by-products of strategies employed by the visual system to stabilize its perceptual representation of surface reflectance against changes in illumination. Computationally, one such strategy is to infer reflectance from the retinal image, and to base the lightness percept on this inference. CNNs trained to infer reflectance from images have proven successful at solving this problem under limited conditions. To evaluate whether these CNNs provide suitable starting points for computational models of human lightness perception, we tested a state-of-the-art CNN on several lightness illusions, and compared its behaviour to prior measurements of human performance. We trained a CNN (Yu & Smith, 2019) to infer reflectance from luminance images. The network had a 30-layer hourglass architecture with skip connections. We trained the network via supervised learning on 100K images, rendered in Blender, each showing randomly placed geometric objects (surfaces, cubes, tori, etc.), with random Lambertian reflectance patterns (solid, Voronoi, or low-pass noise), under randomized point+ambient lighting. The renderer also provided the ground-truth reflectance images required for training. After training, we applied the network to several visual illusions. These included the argyle, Koffka-Adelson, snake, White’s, checkerboard assimilation, and simultaneous contrast illusions, along with their controls where appropriate. The CNN correctly predicted larger illusions in the argyle, Koffka-Adelson, and snake images than in their controls. It also correctly predicted an assimilation effect in White's illusion. It did not, however, account for the checkerboard assimilation or simultaneous contrast effects. These results are consistent with the view that at least some lightness phenomena are by-products of a rational approach to inferring stable representations of physical properties from intrinsically ambiguous retinal images. Furthermore, they suggest that CNN models may be a promising starting point for new models of human lightness perception.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MACO; CIC Approved no
Call Number Admin @ si @ PFV2023 Serial 3890
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa
Title (up) A Deep Learning Based Approach for Synthesizing Realistic Depth Maps Type Conference Article
Year 2023 Publication 22nd International Conference on Image Analysis and Processing Abbreviated Journal
Volume 14234 Issue Pages 369–380
Keywords
Abstract This paper presents a novel cycle generative adversarial network (CycleGAN) architecture for synthesizing high-quality depth maps from a given monocular image. The proposed architecture uses multiple loss functions, including cycle consistency, contrastive, identity, and least square losses, to enable the generation of realistic and high-fidelity depth maps. The proposed approach addresses this challenge by synthesizing depth maps from RGB images without requiring paired training data. Comparisons with several state-of-the-art approaches are provided showing the proposed approach overcome other approaches both in terms of quantitative metrics and visual quality.
Address Udine; Italia; Setember 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICIAP
Notes MSIAU Approved no
Call Number Admin @ si @ SCS2023a Serial 3968
Permanent link to this record
 

 
Author P. Canals; Simone Balocco; O. Diaz; J. Li; A. Garcia Tornel; M. Olive Gadea; M. Ribo
Title (up) A fully automatic method for vascular tortuosity feature extraction in the supra-aortic region: unraveling possibilities in stroke treatment planning Type Journal Article
Year 2023 Publication Computerized Medical Imaging and Graphics Abbreviated Journal CMIG
Volume 104 Issue 102170 Pages
Keywords Artificial intelligence; Deep learning; Stroke; Thrombectomy; Vascular feature extraction; Vascular tortuosity
Abstract Vascular tortuosity of supra-aortic vessels is widely considered one of the main reasons for failure and delays in endovascular treatment of large vessel occlusion in patients with acute ischemic stroke. Characterization of tortuosity is a challenging task due to the lack of objective, robust and effective analysis tools. We present a fully automatic method for arterial segmentation, vessel labelling and tortuosity feature extraction applied to the supra-aortic region. A sample of 566 computed tomography angiography scans from acute ischemic stroke patients (aged 74.8 ± 12.9, 51.0% females) were used for training, validation and testing of a segmentation module based on a U-Net architecture (162 cases) and a vessel labelling module powered by a graph U-Net (566 cases). Successively, 30 cases were processed for testing of a tortuosity feature extraction module. Measurements obtained through automatic processing were compared to manual annotations from two observers for a thorough validation of the method. The proposed feature extraction method presented similar performance to the inter-rater variability observed in the measurement of 33 geometrical and morphological features of the arterial anatomy in the supra-aortic region. This system will contribute to the development of more complex models to advance the treatment of stroke by adding immediate automation, objectivity, repeatability and robustness to the vascular tortuosity characterization of patients.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ CBD2023 Serial 4005
Permanent link to this record
 

 
Author Cristhian A. Aguilera-Carrasco; Luis Felipe Gonzalez-Böhme; Francisco Valdes; Francisco Javier Quitral Zapata; Bogdan Raducanu
Title (up) A Hand-Drawn Language for Human–Robot Collaboration in Wood Stereotomy Type Journal Article
Year 2023 Publication IEEE Access Abbreviated Journal ACCESS
Volume 11 Issue Pages 100975 - 100985
Keywords
Abstract This study introduces a novel, hand-drawn language designed to foster human-robot collaboration in wood stereotomy, central to carpentry and joinery professions. Based on skilled carpenters’ line and symbol etchings on timber, this language signifies the location, geometry of woodworking joints, and timber placement within a framework. A proof-of-concept prototype has been developed, integrating object detectors, keypoint regression, and traditional computer vision techniques to interpret this language and enable an extensive repertoire of actions. Empirical data attests to the language’s efficacy, with the successful identification of a specific set of symbols on various wood species’ sawn surfaces, achieving a mean average precision (mAP) exceeding 90%. Concurrently, the system can accurately pinpoint critical positions that facilitate robotic comprehension of carpenter-indicated woodworking joint geometry. The positioning error, approximately 3 pixels, meets industry standards.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ AGV2023 Serial 3969
Permanent link to this record
 

 
Author Olivier Penacchio; Xavier Otazu; Arnold J Wilkings; Sara M. Haigh
Title (up) A mechanistic account of visual discomfort Type Journal Article
Year 2023 Publication Frontiers in Neuroscience Abbreviated Journal FN
Volume 17 Issue Pages
Keywords
Abstract Much of the neural machinery of the early visual cortex, from the extraction of local orientations to contextual modulations through lateral interactions, is thought to have developed to provide a sparse encoding of contour in natural scenes, allowing the brain to process efficiently most of the visual scenes we are exposed to. Certain visual stimuli, however, cause visual stress, a set of adverse effects ranging from simple discomfort to migraine attacks, and epileptic seizures in the extreme, all phenomena linked with an excessive metabolic demand. The theory of efficient coding suggests a link between excessive metabolic demand and images that deviate from natural statistics. Yet, the mechanisms linking energy demand and image spatial content in discomfort remain elusive. Here, we used theories of visual coding that link image spatial structure and brain activation to characterize the response to images observers reported as uncomfortable in a biologically based neurodynamic model of the early visual cortex that included excitatory and inhibitory layers to implement contextual influences. We found three clear markers of aversive images: a larger overall activation in the model, a less sparse response, and a more unbalanced distribution of activity across spatial orientations. When the ratio of excitation over inhibition was increased in the model, a phenomenon hypothesised to underlie interindividual differences in susceptibility to visual discomfort, the three markers of discomfort progressively shifted toward values typical of the response to uncomfortable stimuli. Overall, these findings propose a unifying mechanistic explanation for why there are differences between images and between observers, suggesting how visual input and idiosyncratic hyperexcitability give rise to abnormal brain responses that result in visual stress.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes NEUROBIT Approved no
Call Number Admin @ si @ POW2023 Serial 3886
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antonio Rosell; Sonia Baeza; Carles Sanchez
Title (up) A radiomic biopsy for virtual histology of pulmonary nodules Type Conference Article
Year 2023 Publication IEEE International Symposium on Biomedical Imaging Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Pòster
Address Cartagena de Indias; Colombia; April 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ISBI
Notes IAM Approved no
Call Number Admin @ si @ TGR2023b Serial 3954
Permanent link to this record
 

 
Author Wenwen Fu; Zhihong An; Wendong Huang; Haoran Sun; Wenjuan Gong; Jordi Gonzalez
Title (up) A Spatio-Temporal Spotting Network with Sliding Windows for Micro-Expression Detection Type Journal Article
Year 2023 Publication Electronics Abbreviated Journal ELEC
Volume 12 Issue 18 Pages 3947
Keywords micro-expression spotting; sliding window; key frame extraction
Abstract Micro-expressions reveal underlying emotions and are widely applied in political psychology, lie detection, law enforcement and medical care. Micro-expression spotting aims to detect the temporal locations of facial expressions from video sequences and is a crucial task in micro-expression recognition. In this study, the problem of micro-expression spotting is formulated as micro-expression classification per frame. We propose an effective spotting model with sliding windows called the spatio-temporal spotting network. The method involves a sliding window detection mechanism, combines the spatial features from the local key frames and the global temporal features and performs micro-expression spotting. The experiments are conducted on the CAS(ME)2 database and the SAMM Long Videos database, and the results demonstrate that the proposed method outperforms the state-of-the-art method by 30.58% for the CAS(ME)2 and 23.98% for the SAMM Long Videos according to overall F-scores.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ FAH2023 Serial 3864
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Asma Bensalah; Jialuo Chen; Alicia Fornes; Michelle Waldispühl
Title (up) A User Perspective on HTR methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus Just Accepted Type Journal Article
Year 2023 Publication ACM Journal on Computing and Cultural Heritage Abbreviated Journal JOCCH
Volume 15 Issue 4 Pages 1-18
Keywords
Abstract Recent breakthroughs in Artificial Intelligence, Deep Learning and Document Image Analysis and Recognition have significantly eased the creation of digital libraries and the transcription of historical documents. However, for documents in rare scripts with few labelled training data available, current Handwritten Text Recognition (HTR) systems are too constraint. Moreover, research on HTR often focuses on technical aspects only, and rarely puts emphasis on implementing software tools for scholars in Humanities. In this article, we describe, compare and analyse different transcription methods for rare scripts. We evaluate their performance in a real use case of a medieval manuscript written in the runic script (Codex Runicus) and discuss advantages and disadvantages of each method from the user perspective. From this exhaustive analysis and comparison with a fully manual transcription, we raise conclusions and provide recommendations to scholars interested in using automatic transcription tools.
Address
Corporate Author Thesis
Publisher ACM Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no
Call Number Admin @ si @ SBC2023 Serial 3732
Permanent link to this record
 

 
Author Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol
Title (up) Accelerating Transformer-Based Scene Text Detection and Recognition via Token Pruning Type Conference Article
Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 14192 Issue Pages 106-121
Keywords Scene Text Detection; Scene Text Recognition; Transformer Acceleration
Abstract Scene text detection and recognition is a crucial task in computer vision with numerous real-world applications. Transformer-based approaches are behind all current state-of-the-art models and have achieved excellent performance. However, the computational requirements of the transformer architecture makes training these methods slow and resource heavy. In this paper, we introduce a new token pruning strategy that significantly decreases training and inference times without sacrificing performance, striking a balance between accuracy and speed. We have applied this pruning technique to our own end-to-end transformer-based scene text understanding architecture. Our method uses a separate detection branch to guide the pruning of uninformative image features, which significantly reduces the number of tokens at the input of the transformer. Experimental results show how our network is able to obtain competitive results on multiple public benchmarks while running at significantly higher speeds.
Address San Jose; CA; USA; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ GKR2023a Serial 3907
Permanent link to this record
 

 
Author Filip Szatkowski; Mateusz Pyla; Marcin Przewięzlikowski; Sebastian Cygert; Bartłomiej Twardowski; Tomasz Trzcinski
Title (up) Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-Free Continual Learning Type Conference Article
Year 2023 Publication Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Abbreviated Journal
Volume Issue Pages 3512-3517
Keywords
Abstract In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main model during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks.
Address Paris; France; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes LAMP Approved no
Call Number Admin @ si @ Serial 3944
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li
Title (up) Advances in Face Presentation Attack Detection Type Book Whole
Year 2023 Publication Advances in Face Presentation Attack Detection Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ WGE2023a Serial 3955
Permanent link to this record
 

 
Author Yi Xiao
Title (up) Advancing Vision-based End-to-End Autonomous Driving Type Book Whole
Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving.
We propose to evaluate three approaches to tackle the challenge of end-to-end
autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s
ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular
depth estimation module, where human labeling is not needed. Then, based on
the intuition that the latent space of end-to-end driving models encodes relevant
information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning.
CIL++ leverages modern best practices, such as a large horizontal field of view and
a self-attention mechanism, which are contributing to the agent’s understanding of
the driving scene and bringing a better imitation of human drivers. Using training
data without any human labeling, our model yields almost expert performance in
the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data.
Address
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Antonio Lopez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-126409-4-6 Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Xia2023 Serial 3964
Permanent link to this record