toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Roberto Morales; Juan Quispe; Eduardo Aguilar edit  url
doi  openurl
  Title Exploring multi-food detection using deep learning-based algorithms Type Conference Article
  Year 2023 Publication 13th International Conference on Pattern Recognition Systems Abbreviated Journal  
  Volume Issue Pages 1-7  
  Keywords  
  Abstract (down) People are becoming increasingly concerned about their diet, whether for disease prevention, medical treatment or other purposes. In meals served in restaurants, schools or public canteens, it is not easy to identify the ingredients and/or the nutritional information they contain. Currently, technological solutions based on deep learning models have facilitated the recording and tracking of food consumed based on the recognition of the main dish present in an image. Considering that sometimes there may be multiple foods served on the same plate, food analysis should be treated as a multi-class object detection problem. EfficientDet and YOLOv5 are object detection algorithms that have demonstrated high mAP and real-time performance on general domain data. However, these models have not been evaluated and compared on public food datasets. Unlike general domain objects, foods have more challenging features inherent in their nature that increase the complexity of detection. In this work, we performed a performance evaluation of Efficient-Det and YOLOv5 on three public food datasets: UNIMIB2016, UECFood256 and ChileanFood64. From the results obtained, it can be seen that YOLOv5 provides a significant difference in terms of both mAP and response time compared to EfficientDet in all datasets. Furthermore, YOLOv5 outperforms the state-of-the-art on UECFood256, achieving an improvement of more than 4% in terms of mAP@.50 over the best reported.  
  Address Guayaquil; Ecuador; July 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPRS  
  Notes MILAB Approved no  
  Call Number Admin @ si @ MQA2023 Serial 3843  
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez edit  openurl
  Title Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules Type Conference Article
  Year 2023 Publication 37th International Congress and Exhibition is organized by Computer Assisted Radiology and Surgery Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) Pòster  
  Address Munich; Germany; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CARS  
  Notes IAM Approved no  
  Call Number Admin @ si @ TGR2023a Serial 3950  
Permanent link to this record
 

 
Author Sonia Baeza; Debora Gil; Carles Sanchez; Guillermo Torres; Ignasi Garcia Olive; Ignasi Guasch; Samuel Garcia Reina; Felipe Andreo; Jose Luis Mate; Jose Luis Vercher; Antonio Rosell edit  openurl
  Title Biopsia virtual radiomica para el diagnóstico histológico de nódulos pulmonares – Resultados intermedios del proyecto Radiolung Type Conference Article
  Year 2023 Publication SEPAR Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) Pòster  
  Address Granada; Spain; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SEPAR  
  Notes IAM Approved no  
  Call Number Admin @ si @ BGS2023 Serial 3951  
Permanent link to this record
 

 
Author Debora Gil; Guillermo Torres; Carles Sanchez edit  openurl
  Title Transforming radiomic features into radiological words Type Conference Article
  Year 2023 Publication IEEE International Symposium on Biomedical Imaging Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) Pòster  
  Address Cartagena de Indias; Colombia; April 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ISBI  
  Notes IAM Approved no  
  Call Number Admin @ si @ GTS2023 Serial 3952  
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antonio Rosell; Sonia Baeza; Carles Sanchez edit  openurl
  Title A radiomic biopsy for virtual histology of pulmonary nodules Type Conference Article
  Year 2023 Publication IEEE International Symposium on Biomedical Imaging Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) Pòster  
  Address Cartagena de Indias; Colombia; April 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ISBI  
  Notes IAM Approved no  
  Call Number Admin @ si @ TGR2023b Serial 3954  
Permanent link to this record
 

 
Author Albin Soutif; Antonio Carta; Andrea Cossu; Julio Hurtado; Hamed Hemati; Vincenzo Lomonaco; Joost Van de Weijer edit   pdf
url  openurl
  Title A Comprehensive Empirical Evaluation on Online Continual Learning Type Conference Article
  Year 2023 Publication Visual Continual Learning (ICCV-W) Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at this https URL based on the avalanche framework to reproduce our results and encourage future research.  
  Address Paris; France; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes LAMP Approved no  
  Call Number Admin @ si @ SCC2023 Serial 3938  
Permanent link to this record
 

 
Author Yi Xiao; Felipe Codevilla; Diego Porres; Antonio Lopez edit  url
openurl 
  Title Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning Type Conference Article
  Year 2023 Publication International Conference on Intelligent Robots and Systems Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.  
  Address Detroit; USA; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IROS  
  Notes ADAS Approved no  
  Call Number Admin @ si @ XCP2023 Serial 3930  
Permanent link to this record
 

 
Author Roger Max Calle Quispe; Maya Aghaei Gavari; Eduardo Aguilar Torres edit  url
openurl 
  Title Towards real-time accurate safety helmets detection through a deep learning-based method Type Journal
  Year 2023 Publication Ingeniare. Revista chilena de ingenieria Abbreviated Journal  
  Volume 31 Issue 12 Pages  
  Keywords  
  Abstract (down) Occupational safety is a fundamental activity in industries and revolves around the management of the necessary controls that must be present to mitigate occupational risks. These controls include verifying the use of Personal Protection Equipment (PPE). Within PPE, safety helmets are vital to reducing severe or fatal consequences caused by head injuries. This problem has been addressed recently by various research based on deep learning to detect the usage of safety helmets by the present people in the industrial field.

These works have achieved promising results for safety helmet detection using object detection methods from the YOLO family. In this work, we propose to analyze the performance of Scaled-YOLOv4, a novel model of the YOLO family that has yet to be previously studied for this problem. The performance of the Scaled-YOLOv4 is evaluated on two public databases, carefully selected among the previously proposed datasets for the occupational safety framework. We demonstrate the superiority of Scaled-YOLOv4 in terms of mAP and Fl-score concerning the previous works for both databases. Further, we summarize the currently available datasets for safety helmet detection purposes and discuss their suitability.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ CAA2023 Serial 3846  
Permanent link to this record
 

 
Author Chengyi Zou; Shuai Wan; Tiannan Ji; Marc Gorriz Blanch; Marta Mrak; Luis Herranz edit  url
doi  openurl
  Title Chroma Intra Prediction with Lightweight Attention-Based Neural Networks Type Journal Article
  Year 2023 Publication IEEE Transactions on Circuits and Systems for Video Technology Abbreviated Journal TCSVT  
  Volume 34 Issue 1 Pages 549 - 560  
  Keywords  
  Abstract (down) Neural networks can be successfully used for cross-component prediction in video coding. In particular, attention-based architectures are suitable for chroma intra prediction using luma information because of their capability to model relations between difierent channels. However, the complexity of such methods is still very high and should be further reduced, especially for decoding. In this paper, a cost-effective attention-based neural network is designed for chroma intra prediction. Moreover, with the goal of further improving coding performance, a novel approach is introduced to utilize more boundary information effectively. In addition to improving prediction, a simplification methodology is also proposed to reduce inference complexity by simplifying convolutions. The proposed schemes are integrated into H.266/Versatile Video Coding (VVC) pipeline, and only one additional binary block-level syntax flag is introduced to indicate whether a given block makes use of the proposed method. Experimental results demonstrate that the proposed scheme achieves up to −0.46%/−2.29%/−2.17% BD-rate reduction on Y/Cb/Cr components, respectively, compared with H.266/VVC anchor. Reductions in the encoding and decoding complexity of up to 22% and 61%, respectively, are achieved by the proposed scheme with respect to the previous attention-based chroma intra prediction method while maintaining coding performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MACO; LAMP Approved no  
  Call Number Admin @ si @ ZWJ2023 Serial 3875  
Permanent link to this record
 

 
Author Albin Soutif; Antonio Carta; Joost Van de Weijer edit   pdf
url  openurl
  Title Improving Online Continual Learning Performance and Stability with Temporal Ensembles Type Conference Article
  Year 2023 Publication 2nd Conference on Lifelong Learning Agents Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.  
  Address Montreal; Canada; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference COLLAS  
  Notes LAMP Approved no  
  Call Number Admin @ si @ SCW2023 Serial 3922  
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades edit   pdf
doi  openurl
  Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 139 Issue Pages 109419  
  Keywords  
  Abstract (down) Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ BMC2023 Serial 3826  
Permanent link to this record
 

 
Author Gisel Bastidas-Guacho; Patricio Moreno; Boris X. Vintimilla; Angel Sappa edit  url
doi  openurl
  Title Application on the Loop of Multimodal Image Fusion: Trends on Deep-Learning Based Approaches Type Conference Article
  Year 2023 Publication 13th International Conference on Pattern Recognition Systems Abbreviated Journal  
  Volume 14234 Issue Pages 25–36  
  Keywords  
  Abstract (down) Multimodal image fusion allows the combination of information from different modalities, which is useful for tasks such as object detection, edge detection, and tracking, to name a few. Using the fused representation for applications results in better task performance. There are several image fusion approaches, which have been summarized in surveys. However, the existing surveys focus on image fusion approaches where the application on the loop of multimodal image fusion is not considered. On the contrary, this study summarizes deep learning-based multimodal image fusion for computer vision (e.g., object detection) and image processing applications (e.g., semantic segmentation), that is, approaches where the application module leverages the multimodal fusion process to enhance the final result. Firstly, we introduce image fusion and the existing general frameworks for image fusion tasks such as multifocus, multiexposure and multimodal. Then, we describe the multimodal image fusion approaches. Next, we review the state-of-the-art deep learning multimodal image fusion approaches for vision applications. Finally, we conclude our survey with the trends of task-driven multimodal image fusion.  
  Address Guayaquil; Ecuador; July 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPRS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ BMV2023 Serial 3932  
Permanent link to this record
 

 
Author Akshita Gupta; Sanath Narayan; Salman Khan; Fahad Shahbaz Khan; Ling Shao; Joost Van de Weijer edit  doi
openurl 
  Title Generative Multi-Label Zero-Shot Learning Type Journal Article
  Year 2023 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 45 Issue 12 Pages 14611-14624  
  Keywords Generalized zero-shot learning; Multi-label classification; Zero-shot object detection; Feature synthesis  
  Abstract (down) Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.  
  Address December 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; PID2021-128178OB-I00 Approved no  
  Call Number Admin @ si @ Serial 3853  
Permanent link to this record
 

 
Author Alejandro Ariza-Casabona; Bartlomiej Twardowski; Tri Kurniawan Wijaya edit  url
openurl 
  Title Exploiting Graph Structured Cross-Domain Representation for Multi-domain Recommendation Type Conference Article
  Year 2023 Publication European Conference on Information Retrieval – ECIR 2023: Advances in Information Retrieval Abbreviated Journal  
  Volume 13980 Issue Pages 49–65  
  Keywords  
  Abstract (down) Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer. Both can be achieved by introducing a specific modeling of input data (i.e. disjoint history) or trying dedicated training regimes. At the same time, treating domains as separate input sources becomes a limitation as it does not capture the interplay that naturally exists between domains. In this work, we efficiently learn multi-domain representation of sequential users’ interactions using graph neural networks. We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec (short for Multi-dom Ain Graph-based Recommender). To better capture all relations in a multi-domain setting, we learn two graph-based sequential representations simultaneously: domain-guided for recent user interest, and general for long-term interest. This approach helps to mitigate the negative knowledge transfer problem from multiple domains and improve overall representation. We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods. Furthermore, we provide an ablation study and discuss further extensions of our method.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECIR  
  Notes LAMP Approved no  
  Call Number Admin @ si @ ATK2023 Serial 3933  
Permanent link to this record
 

 
Author Olivier Penacchio; Xavier Otazu; Arnold J Wilkings; Sara M. Haigh edit  url
openurl 
  Title A mechanistic account of visual discomfort Type Journal Article
  Year 2023 Publication Frontiers in Neuroscience Abbreviated Journal FN  
  Volume 17 Issue Pages  
  Keywords  
  Abstract (down) Much of the neural machinery of the early visual cortex, from the extraction of local orientations to contextual modulations through lateral interactions, is thought to have developed to provide a sparse encoding of contour in natural scenes, allowing the brain to process efficiently most of the visual scenes we are exposed to. Certain visual stimuli, however, cause visual stress, a set of adverse effects ranging from simple discomfort to migraine attacks, and epileptic seizures in the extreme, all phenomena linked with an excessive metabolic demand. The theory of efficient coding suggests a link between excessive metabolic demand and images that deviate from natural statistics. Yet, the mechanisms linking energy demand and image spatial content in discomfort remain elusive. Here, we used theories of visual coding that link image spatial structure and brain activation to characterize the response to images observers reported as uncomfortable in a biologically based neurodynamic model of the early visual cortex that included excitatory and inhibitory layers to implement contextual influences. We found three clear markers of aversive images: a larger overall activation in the model, a less sparse response, and a more unbalanced distribution of activity across spatial orientations. When the ratio of excitation over inhibition was increased in the model, a phenomenon hypothesised to underlie interindividual differences in susceptibility to visual discomfort, the three markers of discomfort progressively shifted toward values typical of the response to uncomfortable stimuli. Overall, these findings propose a unifying mechanistic explanation for why there are differences between images and between observers, suggesting how visual input and idiosyncratic hyperexcitability give rise to abnormal brain responses that result in visual stress.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT Approved no  
  Call Number Admin @ si @ POW2023 Serial 3886  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: