|   | 
Details
   web
Records
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal PR
Volume 139 Issue Pages 109419
Keywords
Abstract (up) Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ BMC2023 Serial 3826
Permanent link to this record
 

 
Author Albin Soutif; Antonio Carta; Joost Van de Weijer
Title Improving Online Continual Learning Performance and Stability with Temporal Ensembles Type Conference Article
Year 2023 Publication 2nd Conference on Lifelong Learning Agents Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.
Address Montreal; Canada; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference COLLAS
Notes LAMP Approved no
Call Number Admin @ si @ SCW2023 Serial 3922
Permanent link to this record
 

 
Author Chengyi Zou; Shuai Wan; Tiannan Ji; Marc Gorriz Blanch; Marta Mrak; Luis Herranz
Title Chroma Intra Prediction with Lightweight Attention-Based Neural Networks Type Journal Article
Year 2023 Publication IEEE Transactions on Circuits and Systems for Video Technology Abbreviated Journal TCSVT
Volume 34 Issue 1 Pages 549 - 560
Keywords
Abstract (up) Neural networks can be successfully used for cross-component prediction in video coding. In particular, attention-based architectures are suitable for chroma intra prediction using luma information because of their capability to model relations between difierent channels. However, the complexity of such methods is still very high and should be further reduced, especially for decoding. In this paper, a cost-effective attention-based neural network is designed for chroma intra prediction. Moreover, with the goal of further improving coding performance, a novel approach is introduced to utilize more boundary information effectively. In addition to improving prediction, a simplification methodology is also proposed to reduce inference complexity by simplifying convolutions. The proposed schemes are integrated into H.266/Versatile Video Coding (VVC) pipeline, and only one additional binary block-level syntax flag is introduced to indicate whether a given block makes use of the proposed method. Experimental results demonstrate that the proposed scheme achieves up to −0.46%/−2.29%/−2.17% BD-rate reduction on Y/Cb/Cr components, respectively, compared with H.266/VVC anchor. Reductions in the encoding and decoding complexity of up to 22% and 61%, respectively, are achieved by the proposed scheme with respect to the previous attention-based chroma intra prediction method while maintaining coding performance.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MACO; LAMP Approved no
Call Number Admin @ si @ ZWJ2023 Serial 3875
Permanent link to this record
 

 
Author Roger Max Calle Quispe; Maya Aghaei Gavari; Eduardo Aguilar Torres
Title Towards real-time accurate safety helmets detection through a deep learning-based method Type Journal
Year 2023 Publication Ingeniare. Revista chilena de ingenieria Abbreviated Journal
Volume 31 Issue 12 Pages
Keywords
Abstract (up) Occupational safety is a fundamental activity in industries and revolves around the management of the necessary controls that must be present to mitigate occupational risks. These controls include verifying the use of Personal Protection Equipment (PPE). Within PPE, safety helmets are vital to reducing severe or fatal consequences caused by head injuries. This problem has been addressed recently by various research based on deep learning to detect the usage of safety helmets by the present people in the industrial field.

These works have achieved promising results for safety helmet detection using object detection methods from the YOLO family. In this work, we propose to analyze the performance of Scaled-YOLOv4, a novel model of the YOLO family that has yet to be previously studied for this problem. The performance of the Scaled-YOLOv4 is evaluated on two public databases, carefully selected among the previously proposed datasets for the occupational safety framework. We demonstrate the superiority of Scaled-YOLOv4 in terms of mAP and Fl-score concerning the previous works for both databases. Further, we summarize the currently available datasets for safety helmet detection purposes and discuss their suitability.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ CAA2023 Serial 3846
Permanent link to this record
 

 
Author Yi Xiao; Felipe Codevilla; Diego Porres; Antonio Lopez
Title Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning Type Conference Article
Year 2023 Publication International Conference on Intelligent Robots and Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
Address Detroit; USA; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IROS
Notes ADAS Approved no
Call Number Admin @ si @ XCP2023 Serial 3930
Permanent link to this record
 

 
Author Albin Soutif; Antonio Carta; Andrea Cossu; Julio Hurtado; Hamed Hemati; Vincenzo Lomonaco; Joost Van de Weijer
Title A Comprehensive Empirical Evaluation on Online Continual Learning Type Conference Article
Year 2023 Publication Visual Continual Learning (ICCV-W) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at this https URL based on the avalanche framework to reproduce our results and encourage future research.
Address Paris; France; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes LAMP Approved no
Call Number Admin @ si @ SCC2023 Serial 3938
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez
Title Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules Type Conference Article
Year 2023 Publication 37th International Congress and Exhibition is organized by Computer Assisted Radiology and Surgery Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Pòster
Address Munich; Germany; June 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CARS
Notes IAM Approved no
Call Number Admin @ si @ TGR2023a Serial 3950
Permanent link to this record
 

 
Author Sonia Baeza; Debora Gil; Carles Sanchez; Guillermo Torres; Ignasi Garcia Olive; Ignasi Guasch; Samuel Garcia Reina; Felipe Andreo; Jose Luis Mate; Jose Luis Vercher; Antonio Rosell
Title Biopsia virtual radiomica para el diagnóstico histológico de nódulos pulmonares – Resultados intermedios del proyecto Radiolung Type Conference Article
Year 2023 Publication SEPAR Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Pòster
Address Granada; Spain; June 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference SEPAR
Notes IAM Approved no
Call Number Admin @ si @ BGS2023 Serial 3951
Permanent link to this record
 

 
Author Debora Gil; Guillermo Torres; Carles Sanchez
Title Transforming radiomic features into radiological words Type Conference Article
Year 2023 Publication IEEE International Symposium on Biomedical Imaging Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Pòster
Address Cartagena de Indias; Colombia; April 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ISBI
Notes IAM Approved no
Call Number Admin @ si @ GTS2023 Serial 3952
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antonio Rosell; Sonia Baeza; Carles Sanchez
Title A radiomic biopsy for virtual histology of pulmonary nodules Type Conference Article
Year 2023 Publication IEEE International Symposium on Biomedical Imaging Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Pòster
Address Cartagena de Indias; Colombia; April 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ISBI
Notes IAM Approved no
Call Number Admin @ si @ TGR2023b Serial 3954
Permanent link to this record
 

 
Author Roberto Morales; Juan Quispe; Eduardo Aguilar
Title Exploring multi-food detection using deep learning-based algorithms Type Conference Article
Year 2023 Publication 13th International Conference on Pattern Recognition Systems Abbreviated Journal
Volume Issue Pages 1-7
Keywords
Abstract (up) People are becoming increasingly concerned about their diet, whether for disease prevention, medical treatment or other purposes. In meals served in restaurants, schools or public canteens, it is not easy to identify the ingredients and/or the nutritional information they contain. Currently, technological solutions based on deep learning models have facilitated the recording and tracking of food consumed based on the recognition of the main dish present in an image. Considering that sometimes there may be multiple foods served on the same plate, food analysis should be treated as a multi-class object detection problem. EfficientDet and YOLOv5 are object detection algorithms that have demonstrated high mAP and real-time performance on general domain data. However, these models have not been evaluated and compared on public food datasets. Unlike general domain objects, foods have more challenging features inherent in their nature that increase the complexity of detection. In this work, we performed a performance evaluation of Efficient-Det and YOLOv5 on three public food datasets: UNIMIB2016, UECFood256 and ChileanFood64. From the results obtained, it can be seen that YOLOv5 provides a significant difference in terms of both mAP and response time compared to EfficientDet in all datasets. Furthermore, YOLOv5 outperforms the state-of-the-art on UECFood256, achieving an improvement of more than 4% in terms of mAP@.50 over the best reported.
Address Guayaquil; Ecuador; July 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPRS
Notes MILAB Approved no
Call Number Admin @ si @ MQA2023 Serial 3843
Permanent link to this record
 

 
Author Dipam Goswami; Yuyang Liu ; Bartlomiej Twardowski; Joost Van de Weijer
Title FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning Type Conference Article
Year 2023 Publication 37th Annual Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Poster
Address New Orleans; USA; December 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference NEURIPS
Notes LAMP Approved no
Call Number Admin @ si @ GLT2023 Serial 3934
Permanent link to this record
 

 
Author Kai Wang; Fei Yang; Shiqi Yang; Muhammad Atif Butt; Joost Van de Weijer
Title Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Type Conference Article
Year 2023 Publication 37th Annual Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Poster
Address New Orleans; USA; December 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference NEURIPS
Notes LAMP Approved no
Call Number Admin @ si @ WYY2023 Serial 3935
Permanent link to this record
 

 
Author Wenwen Yu; Mingyu Liu; Mingrui Chen; Ning Lu; Yinlong We; Yuliang Liu; Dimosthenis Karatzas; Xiang Bai
Title ICDAR 2023 Competition on Reading the Seal Title Type Conference Article
Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 14188 Issue Pages 522–535
Keywords
Abstract (up) Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants and received 135 submissions from academia and industry, including 28 participants and 72 submissions for Task 1, and 25 participants and 63 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology.
Address San Jose; CA; USA; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ YLC2023 Serial 3897
Permanent link to this record
 

 
Author Senmao Li; Joost Van de Weijer; Yaxing Wang; Fahad Shahbaz Khan; Meiqin Liu; Jian Yang
Title 3D-Aware Multi-Class Image-to-Image Translation with NeRFs Type Conference Article
Year 2023 Publication 36th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 12652-12662
Keywords
Abstract (up) Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware 121 translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In exten-sive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware 121 translation with multi-view consistency. Code is available in 3DI2I.
Address Vancouver; Canada; June 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes LAMP Approved no
Call Number Admin @ si @ LWW2023b Serial 3920
Permanent link to this record