Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	76–90 of 161 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

List View

Citations

Details

	Records
	Author	Roberto Morales; Juan Quispe; Eduardo Aguilar
	Title	Exploring multi-food detection using deep learning-based algorithms			Type	Conference Article
	Year	2023	Publication	13th International Conference on Pattern Recognition Systems	Abbreviated Journal
	Volume		Issue		Pages	1-7
	Keywords
	Abstract	People are becoming increasingly concerned about their diet, whether for disease prevention, medical treatment or other purposes. In meals served in restaurants, schools or public canteens, it is not easy to identify the ingredients and/or the nutritional information they contain. Currently, technological solutions based on deep learning models have facilitated the recording and tracking of food consumed based on the recognition of the main dish present in an image. Considering that sometimes there may be multiple foods served on the same plate, food analysis should be treated as a multi-class object detection problem. EfficientDet and YOLOv5 are object detection algorithms that have demonstrated high mAP and real-time performance on general domain data. However, these models have not been evaluated and compared on public food datasets. Unlike general domain objects, foods have more challenging features inherent in their nature that increase the complexity of detection. In this work, we performed a performance evaluation of Efficient-Det and YOLOv5 on three public food datasets: UNIMIB2016, UECFood256 and ChileanFood64. From the results obtained, it can be seen that YOLOv5 provides a significant difference in terms of both mAP and response time compared to EfficientDet in all datasets. Furthermore, YOLOv5 outperforms the state-of-the-art on UECFood256, achieving an improvement of more than 4% in terms of mAP@.50 over the best reported.
	Address	Guayaquil; Ecuador; July 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPRS
	Notes	MILAB			Approved	no
	Call Number	Admin @ si @ MQA2023			Serial	3843
Permanent link to this record



	Author	Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez
	Title	Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules			Type	Conference Article
	Year	2023	Publication	37th International Congress and Exhibition is organized by Computer Assisted Radiology and Surgery	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Pòster
	Address	Munich; Germany; June 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CARS
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ TGR2023a			Serial	3950
Permanent link to this record



	Author	Sonia Baeza; Debora Gil; Carles Sanchez; Guillermo Torres; Ignasi Garcia Olive; Ignasi Guasch; Samuel Garcia Reina; Felipe Andreo; Jose Luis Mate; Jose Luis Vercher; Antonio Rosell
	Title	Biopsia virtual radiomica para el diagnóstico histológico de nódulos pulmonares – Resultados intermedios del proyecto Radiolung			Type	Conference Article
	Year	2023	Publication	SEPAR	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Pòster
	Address	Granada; Spain; June 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	SEPAR
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ BGS2023			Serial	3951
Permanent link to this record



	Author	Debora Gil; Guillermo Torres; Carles Sanchez
	Title	Transforming radiomic features into radiological words			Type	Conference Article
	Year	2023	Publication	IEEE International Symposium on Biomedical Imaging	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Pòster
	Address	Cartagena de Indias; Colombia; April 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ISBI
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ GTS2023			Serial	3952
Permanent link to this record



	Author	Guillermo Torres; Debora Gil; Antonio Rosell; Sonia Baeza; Carles Sanchez
	Title	A radiomic biopsy for virtual histology of pulmonary nodules			Type	Conference Article
	Year	2023	Publication	IEEE International Symposium on Biomedical Imaging	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Pòster
	Address	Cartagena de Indias; Colombia; April 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ISBI
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ TGR2023b			Serial	3954
Permanent link to this record



	Author	Albin Soutif; Antonio Carta; Andrea Cossu; Julio Hurtado; Hamed Hemati; Vincenzo Lomonaco; Joost Van de Weijer
	Title	A Comprehensive Empirical Evaluation on Online Continual Learning			Type	Conference Article
	Year	2023	Publication	Visual Continual Learning (ICCV-W)	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at this https URL based on the avalanche framework to reproduce our results and encourage future research.
	Address	Paris; France; October 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ SCC2023			Serial	3938
Permanent link to this record



	Author	Yi Xiao; Felipe Codevilla; Diego Porres; Antonio Lopez
	Title	Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning			Type	Conference Article
	Year	2023	Publication	International Conference on Intelligent Robots and Systems	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
	Address	Detroit; USA; October 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	IROS
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ XCP2023			Serial	3930
Permanent link to this record



	Author	Roger Max Calle Quispe; Maya Aghaei Gavari; Eduardo Aguilar Torres
	Title	Towards real-time accurate safety helmets detection through a deep learning-based method			Type	Journal
	Year	2023	Publication	Ingeniare. Revista chilena de ingenieria	Abbreviated Journal
	Volume	31	Issue	12	Pages
	Keywords
	Abstract	Occupational safety is a fundamental activity in industries and revolves around the management of the necessary controls that must be present to mitigate occupational risks. These controls include verifying the use of Personal Protection Equipment (PPE). Within PPE, safety helmets are vital to reducing severe or fatal consequences caused by head injuries. This problem has been addressed recently by various research based on deep learning to detect the usage of safety helmets by the present people in the industrial field. These works have achieved promising results for safety helmet detection using object detection methods from the YOLO family. In this work, we propose to analyze the performance of Scaled-YOLOv4, a novel model of the YOLO family that has yet to be previously studied for this problem. The performance of the Scaled-YOLOv4 is evaluated on two public databases, carefully selected among the previously proposed datasets for the occupational safety framework. We demonstrate the superiority of Scaled-YOLOv4 in terms of mAP and Fl-score concerning the previous works for both databases. Further, we summarize the currently available datasets for safety helmet detection purposes and discuss their suitability.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB			Approved	no
	Call Number	Admin @ si @ CAA2023			Serial	3846
Permanent link to this record



	Author	Chengyi Zou; Shuai Wan; Tiannan Ji; Marc Gorriz Blanch; Marta Mrak; Luis Herranz
	Title	Chroma Intra Prediction with Lightweight Attention-Based Neural Networks			Type	Journal Article
	Year	2023	Publication	IEEE Transactions on Circuits and Systems for Video Technology	Abbreviated Journal	TCSVT
	Volume	34	Issue	1	Pages	549 - 560
	Keywords
	Abstract	Neural networks can be successfully used for cross-component prediction in video coding. In particular, attention-based architectures are suitable for chroma intra prediction using luma information because of their capability to model relations between difierent channels. However, the complexity of such methods is still very high and should be further reduced, especially for decoding. In this paper, a cost-effective attention-based neural network is designed for chroma intra prediction. Moreover, with the goal of further improving coding performance, a novel approach is introduced to utilize more boundary information effectively. In addition to improving prediction, a simplification methodology is also proposed to reduce inference complexity by simplifying convolutions. The proposed schemes are integrated into H.266/Versatile Video Coding (VVC) pipeline, and only one additional binary block-level syntax flag is introduced to indicate whether a given block makes use of the proposed method. Experimental results demonstrate that the proposed scheme achieves up to −0.46%/−2.29%/−2.17% BD-rate reduction on Y/Cb/Cr components, respectively, compared with H.266/VVC anchor. Reductions in the encoding and decoding complexity of up to 22% and 61%, respectively, are achieved by the proposed scheme with respect to the previous attention-based chroma intra prediction method while maintaining coding performance.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MACO; LAMP			Approved	no
	Call Number	Admin @ si @ ZWJ2023			Serial	3875
Permanent link to this record



	Author	Albin Soutif; Antonio Carta; Joost Van de Weijer
	Title	Improving Online Continual Learning Performance and Stability with Temporal Ensembles			Type	Conference Article
	Year	2023	Publication	2nd Conference on Lifelong Learning Agents	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.
	Address	Montreal; Canada; August 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	COLLAS
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ SCW2023			Serial	3922
Permanent link to this record



	Author	Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
	Title	VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	139	Issue		Pages	109419
	Keywords
	Abstract	Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	ISSN 0031-3203	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ BMC2023			Serial	3826
Permanent link to this record



	Author	Gisel Bastidas-Guacho; Patricio Moreno; Boris X. Vintimilla; Angel Sappa
	Title	Application on the Loop of Multimodal Image Fusion: Trends on Deep-Learning Based Approaches			Type	Conference Article
	Year	2023	Publication	13th International Conference on Pattern Recognition Systems	Abbreviated Journal
	Volume	14234	Issue		Pages	25–36
	Keywords
	Abstract	Multimodal image fusion allows the combination of information from different modalities, which is useful for tasks such as object detection, edge detection, and tracking, to name a few. Using the fused representation for applications results in better task performance. There are several image fusion approaches, which have been summarized in surveys. However, the existing surveys focus on image fusion approaches where the application on the loop of multimodal image fusion is not considered. On the contrary, this study summarizes deep learning-based multimodal image fusion for computer vision (e.g., object detection) and image processing applications (e.g., semantic segmentation), that is, approaches where the application module leverages the multimodal fusion process to enhance the final result. Firstly, we introduce image fusion and the existing general frameworks for image fusion tasks such as multifocus, multiexposure and multimodal. Then, we describe the multimodal image fusion approaches. Next, we review the state-of-the-art deep learning multimodal image fusion approaches for vision applications. Finally, we conclude our survey with the trends of task-driven multimodal image fusion.
	Address	Guayaquil; Ecuador; July 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPRS
	Notes	MSIAU			Approved	no
	Call Number	Admin @ si @ BMV2023			Serial	3932
Permanent link to this record



	Author	Akshita Gupta; Sanath Narayan; Salman Khan; Fahad Shahbaz Khan; Ling Shao; Joost Van de Weijer
	Title	Generative Multi-Label Zero-Shot Learning			Type	Journal Article
	Year	2023	Publication	IEEE Transactions on Pattern Analysis and Machine Intelligence	Abbreviated Journal	TPAMI
	Volume	45	Issue	12	Pages	14611-14624
	Keywords	Generalized zero-shot learning; Multi-label classification; Zero-shot object detection; Feature synthesis
	Abstract	Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.
	Address	December 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; PID2021-128178OB-I00			Approved	no
	Call Number	Admin @ si @			Serial	3853
Permanent link to this record



	Author	Alejandro Ariza-Casabona; Bartlomiej Twardowski; Tri Kurniawan Wijaya
	Title	Exploiting Graph Structured Cross-Domain Representation for Multi-domain Recommendation			Type	Conference Article
	Year	2023	Publication	European Conference on Information Retrieval – ECIR 2023: Advances in Information Retrieval	Abbreviated Journal
	Volume	13980	Issue		Pages	49–65
	Keywords
	Abstract	Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer. Both can be achieved by introducing a specific modeling of input data (i.e. disjoint history) or trying dedicated training regimes. At the same time, treating domains as separate input sources becomes a limitation as it does not capture the interplay that naturally exists between domains. In this work, we efficiently learn multi-domain representation of sequential users’ interactions using graph neural networks. We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec (short for Multi-dom Ain Graph-based Recommender). To better capture all relations in a multi-domain setting, we learn two graph-based sequential representations simultaneously: domain-guided for recent user interest, and general for long-term interest. This approach helps to mitigate the negative knowledge transfer problem from multiple domains and improve overall representation. We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods. Furthermore, we provide an ablation study and discuss further extensions of our method.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECIR
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ ATK2023			Serial	3933
Permanent link to this record



	Author	Olivier Penacchio; Xavier Otazu; Arnold J Wilkings; Sara M. Haigh
	Title	A mechanistic account of visual discomfort			Type	Journal Article
	Year	2023	Publication	Frontiers in Neuroscience	Abbreviated Journal	FN
	Volume	17	Issue		Pages
	Keywords
	Abstract	Much of the neural machinery of the early visual cortex, from the extraction of local orientations to contextual modulations through lateral interactions, is thought to have developed to provide a sparse encoding of contour in natural scenes, allowing the brain to process efficiently most of the visual scenes we are exposed to. Certain visual stimuli, however, cause visual stress, a set of adverse effects ranging from simple discomfort to migraine attacks, and epileptic seizures in the extreme, all phenomena linked with an excessive metabolic demand. The theory of efficient coding suggests a link between excessive metabolic demand and images that deviate from natural statistics. Yet, the mechanisms linking energy demand and image spatial content in discomfort remain elusive. Here, we used theories of visual coding that link image spatial structure and brain activation to characterize the response to images observers reported as uncomfortable in a biologically based neurodynamic model of the early visual cortex that included excitatory and inhibitory layers to implement contextual influences. We found three clear markers of aversive images: a larger overall activation in the model, a less sparse response, and a more unbalanced distribution of activity across spatial orientations. When the ratio of excitation over inhibition was increased in the model, a phenomenon hypothesised to underlie interindividual differences in susceptibility to visual discomfort, the three markers of discomfort progressively shifted toward values typical of the response to uncomfortable stimuli. Overall, these findings propose a unifying mechanistic explanation for why there are differences between images and between observers, suggesting how visual input and idiosyncratic hyperexcitability give rise to abnormal brain responses that result in visual stress.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	NEUROBIT			Approved	no
	Call Number	Admin @ si @ POW2023			Serial	3886
Permanent link to this record