Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 >>

Details

Records
Author	Carola Figueroa Flores; David Berga; Joost Van de Weijer; Bogdan Raducanu
Title	Saliency for free: Saliency prediction as a side-effect of object recognition			Type	Journal Article
Year	2021	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	150	Issue		Pages	1-7
Keywords	Saliency maps; Unsupervised learning; Object recognition
Abstract	Saliency is the perceptual capacity of our visual system to focus our attention (i.e. gaze) on relevant objects instead of the background. So far, computational methods for saliency estimation required the explicit generation of a saliency map, process which is usually achieved via eyetracking experiments on still images. This is a tedious process that needs to be repeated for each new dataset. In the current paper, we demonstrate that is possible to automatically generate saliency maps without ground-truth. In our approach, saliency maps are learned as a side effect of object recognition. Extensive experiments carried out on both real and synthetic datasets demonstrated that our approach is able to generate accurate saliency maps, achieving competitive results when compared with supervised methods.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.147; 600.120			Approved	no
Call Number	Admin @ si @ FBW2021			Serial	3559
Permanent link to this record



Author	Carola Figueroa Flores; Bogdan Raducanu; David Berga; Joost Van de Weijer
Title	Hallucinating Saliency Maps for Fine-Grained Image Classification for Limited Data Domains			Type	Conference Article
Year	2021	Publication	16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications	Abbreviated Journal
Volume	4	Issue		Pages	163-171
Keywords
Abstract	arXiv:2007.12562 Most of the saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline, like for instance, image classification. In the current paper, we propose an approach which does not require explicit saliency maps to improve image classification, but they are learned implicitely, during the training of an end-to-end image classification task. We show that our approach obtains similar results as the case when the saliency maps are provided explicitely. Combining RGB data with saliency maps represents a significant advantage for object recognition, especially for the case when training data is limited. We validate our method on several datasets for fine-grained classification tasks (Flowers, Birds and Cars). In addition, we show that our saliency estimation method, which is trained without any saliency groundtruth data, obtains competitive results on real image saliency benchmark (Toronto), and outperforms deep saliency models with synthetic images (SID4VAM).
Address	Virtual; February 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISAPP
Notes	LAMP			Approved	no
Call Number	Admin @ si @ FRB2021c			Serial	3540
Permanent link to this record



Author	Carola Figueroa Flores
Title	Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency			Type	Book Whole
Year	2021	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords	computer vision; visual saliency; fine-grained object recognition; convolutional neural networks; images classification
Abstract	For humans, the recognition of objects is an almost instantaneous, precise and extremely adaptable process. Furthermore, we have the innate capability to learn new object classes from only few examples. The human brain lowers the complexity of the incoming data by filtering out part of the information and only processing those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple glance the most important or salient regions from an image. This mechanism can be observed by analyzing on which parts of images subjects place attention; where they fix their eyes when an image is shown to them. The most accurate way to record this behavior is to track eye movements while displaying images. Computational saliency estimation aims to identify to what extent regions or objects stand out with respect to their surroundings to human observers. Saliency maps can be used in a wide range of applications including object detection, image and video compression, and visual tracking. The majority of research in the field has focused on automatically estimating saliency maps given an input image. Instead, in this thesis, we set out to incorporate saliency maps in an object recognition pipeline: we want to investigate whether saliency maps can improve object recognition results. In this thesis, we identify several problems related to visual saliency estimation. First, to what extent the estimation of saliency can be exploited to improve the training of an object recognition model when scarce training data is available. To solve this problem, we design an image classification network that incorporates saliency information as input. This network processes the saliency map through a dedicated network branch and uses the resulting characteristics to modulate the standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive experiments on standard benchmark datasets for fine-grained object recognition, we show that our proposed architecture can significantly improve performance, especially on dataset with scarce training data. Next, we address the main drawback of the above pipeline: SMIC requires an explicit saliency algorithm that must be trained on a saliency dataset. To solve this, we implement a hallucination mechanism that allows us to incorporate the saliency estimation branch in an end-to-end trained neural network architecture that only needs the RGB image as an input. A side-effect of this architecture is the estimation of saliency maps. In experiments, we show that this architecture can obtain similar results on object recognition as SMIC but without the requirement of ground truth saliency maps to train the system. Finally, we evaluated the accuracy of the saliency maps that occur as a sideeffect of object recognition. For this purpose, we use a set of benchmark datasets for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human eye-tracking experiments. Our results show that these saliency maps can obtain competitive results on benchmark saliency maps. On one synthetic saliency dataset this method even obtains the state-of-the-art without the need of ever having seen an actual saliency image for training.
Address	March 2021
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Bogdan Raducanu
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-122714-4-7	Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ Fig2021			Serial	3600
Permanent link to this record



Author	Bartlomiej Twardowski; Pawel Zawistowski; Szymon Zaborowski
Title	Metric Learning for Session-Based Recommendations			Type	Conference Article
Year	2021	Publication	43rd edition of the annual BCS-IRSG European Conference on Information Retrieval	Abbreviated Journal
Volume	12656	Issue		Pages	650-665
Keywords	Session-based recommendations; Deep metric learning; Learning to rank
Abstract	Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study.
Address	Virtual; March 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECIR
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ TZZ2021			Serial	3586
Permanent link to this record



Author	Arturo Fuentes; F. Javier Sanchez; Thomas Voncina; Jorge Bernal
Title	LAMV: Learning to Predict Where Spectators Look in Live Music Performances			Type	Conference Article
Year	2021	Publication	16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications	Abbreviated Journal
Volume	5	Issue		Pages	500-507
Keywords
Abstract	The advent of artificial intelligence has supposed an evolution on how different daily work tasks are performed. The analysis of cultural content has seen a huge boost by the development of computer-assisted methods that allows easy and transparent data access. In our case, we deal with the automation of the production of live shows, like music concerts, aiming to develop a system that can indicate the producer which camera to show based on what each of them is showing. In this context, we consider that is essential to understand where spectators look and what they are interested in so the computational method can learn from this information. The work that we present here shows the results of a first preliminary study in which we compare areas of interest defined by human beings and those indicated by an automatic system. Our system is based on the extraction of motion textures from dynamic Spatio-Temporal Volumes (STV) and then analyzing the patterns by means of texture analysis techniques. We validate our approach over several video sequences that have been labeled by 16 different experts. Our method is able to match those relevant areas identified by the experts, achieving recall scores higher than 80% when a distance of 80 pixels between method and ground truth is considered. Current performance shows promise when detecting abnormal peaks and movement trends.
Address	Virtual; February 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISIGRAPP
Notes	MV; ISE; 600.119;			Approved	no
Call Number	Admin @ si @ FSV2021			Serial	3570
Permanent link to this record



Author	Armin Mehri; Parichehr Behjati Ardakani; Angel Sappa
Title	MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution			Type	Conference Article
Year	2021	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages	2703-2712
Keywords
Abstract	Lightweight super resolution networks have extremely importance for real-world applications. In recent years several SR deep learning approaches with outstanding achievement have been introduced by sacrificing memory and computational cost. To overcome this problem, a novel lightweight super resolution network is proposed, which improves the SOTA performance in lightweight SR and performs roughly similar to computationally expensive networks. Multi-Path Residual Network designs with a set of Residual concatenation Blocks stacked with Adaptive Residual Blocks: ($i$) to adaptively extract informative features and learn more expressive spatial context information; ($ii$) to better leverage multi-level representations before up-sampling stage; and ($iii$) to allow an efficient information and gradient flow within the network. The proposed architecture also contains a new attention mechanism, Two-Fold Attention Module, to maximize the representation ability of the model. Extensive experiments show the superiority of our model against other SOTA SR approaches.
Address	Virtual; January 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	MSIAU; 600.130; 600.122			Approved	no
Call Number	Admin @ si @ MAS2021b			Serial	3582
Permanent link to this record



Author	Armin Mehri; Parichehr Behjati Ardakani; Angel Sappa
Title	LiNet: A Lightweight Network for Image Super Resolution			Type	Conference Article
Year	2021	Publication	25th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	7196-7202
Keywords
Abstract	This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.
Address	Virtual; January 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU; 600.130; 600.122			Approved	no
Call Number	Admin @ si @ MAS2021a			Serial	3583
Permanent link to this record



Author	Arka Ujjal Dey; Suman Ghosh; Ernest Valveny; Gaurav Harit
Title	Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding			Type	Journal Article
Year	2021	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	149	Issue		Pages	164-171
Keywords
Abstract	Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We do not only extract and encode visual and scene text cues, but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images, with scene text content, to demonstrate its effectiveness. In the retrieval framework, we augment our learned text-visual semantic representation with scene text cues, to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous recognition of scene text, we also apply query-based attention to our text channel. We show how the multi-channel approach, involving visual semantics and scene text, improves upon state of the art.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ DGV2021			Serial	3364
Permanent link to this record



Author	Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas
Title	Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval			Type	Conference Article
Year	2021	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages	4022-4032
Keywords
Abstract
Address	Virtual; January 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ MDB2021			Serial	3491
Permanent link to this record



Author	Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title	Real-time Lexicon-free Scene Text Retrieval			Type	Journal Article
Year	2021	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	110	Issue		Pages	107656
Keywords
Abstract	In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.129; 601.338			Approved	no
Call Number	Admin @ si @ MTD2021			Serial	3493
Permanent link to this record



Author	Andres Mafla; Rafael S. Rezende; Lluis Gomez; Diana Larlus; Dimosthenis Karatzas
Title	StacMR: Scene-Text Aware Cross-Modal Retrieval			Type	Conference Article
Year	2021	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages	2219-2229
Keywords
Abstract
Address	Virtual; January 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ MRG2021a			Serial	3492
Permanent link to this record



Author	Andreea Glavan; Alina Matei; Petia Radeva; Estefania Talavera
Title	Does our social life influence our nutritional behaviour? Understanding nutritional habits from egocentric photo-streams			Type	Journal Article
Year	2021	Publication	Expert Systems with Applications	Abbreviated Journal	ESWA
Volume	171	Issue		Pages	114506
Keywords
Abstract	Nutrition and social interactions are both key aspects of the daily lives of humans. In this work, we propose a system to evaluate the influence of social interaction in the nutritional habits of a person from a first-person perspective. In order to detect the routine of an individual, we construct a nutritional behaviour pattern discovery model, which outputs routines over a number of days. Our method evaluates similarity of routines with respect to visited food-related scenes over the collected days, making use of Dynamic Time Warping, as well as considering social engagement and its correlation with food-related activities. The nutritional and social descriptors of the collected days are evaluated and encoded using an LSTM Autoencoder. Later, the obtained latent space is clustered to find similar days unaffected by outliers using the Isolation Forest method. Moreover, we introduce a new score metric to evaluate the performance of the proposed algorithm. We validate our method on 104 days and more than 100 k egocentric images gathered by 7 users. Several different visualizations are evaluated for the understanding of the findings. Our results demonstrate good performance and applicability of our proposed model for social-related nutritional behaviour understanding. At the end, relevant applications of the model are discussed by analysing the discovered routine of particular individuals.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB; no proj			Approved	no
Call Number	Admin @ si @ GMR2021			Serial	3634
Permanent link to this record



Author	AN Ruchai; VI Kober; KA Dorofeev; VN Karnaukhov; Mikhail Mozerov
Title	Classification of breast abnormalities using a deep convolutional neural network and transfer learning			Type	Journal Article
Year	2021	Publication	Journal of Communications Technology and Electronics	Abbreviated Journal
Volume	66	Issue	6	Pages	778–783
Keywords
Abstract	A new algorithm for classification of breast pathologies in digital mammography using a convolutional neural network and transfer learning is proposed. The following pretrained neural networks were chosen: MobileNetV2, InceptionResNetV2, Xception, and ResNetV2. All mammographic images were pre-processed to improve classification reliability. Transfer training was carried out using additional data augmentation and fine-tuning. The performance of the proposed algorithm for classification of breast pathologies in terms of accuracy on real data is discussed and compared with that of state-of-the-art algorithms on the available MIAS database.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP;			Approved	no
Call Number	Admin @ si @ RKD2022			Serial	3680
Permanent link to this record



Author	Alina Matei; Andreea Glavan; Petia Radeva; Estefania Talavera
Title	Towards Eating Habits Discovery in Egocentric Photo-Streams			Type	Journal Article
Year	2021	Publication	IEEE Access	Abbreviated Journal	ACCESS
Volume	9	Issue		Pages	17495-17506
Keywords
Abstract	Eating habits are learned throughout the early stages of our lives. However, it is not easy to be aware of how our food-related routine affects our healthy living. In this work, we address the unsupervised discovery of nutritional habits from egocentric photo-streams. We build a food-related behavioral pattern discovery model, which discloses nutritional routines from the activities performed throughout the days. To do so, we rely on Dynamic-Time-Warping for the evaluation of similarity among the collected days. Within this framework, we present a simple, but robust and fast novel classification pipeline that outperforms the state-of-the-art on food-related image classification with a weighted accuracy and F-score of 70% and 63%, respectively. Later, we identify days composed of nutritional activities that do not describe the habits of the person as anomalies in the daily life of the user with the Isolation Forest method. Furthermore, we show an application for the identification of food-related scenes when the camera wearer eats in isolation. Results have shown the good performance of the proposed model and its relevance to visualize the nutritional habits of individuals.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB; no proj			Approved	no
Call Number	Admin @ si @ MGR2021			Serial	3637
Permanent link to this record



Author	Alejandro Cartas; Petia Radeva; Mariella Dimiccoli
Title	Modeling long-term interactions to enhance action recognition			Type	Conference Article
Year	2021	Publication	25th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	10351-10358
Keywords
Abstract	In this paper, we propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. At the frame level, we use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects and calculates the action score through a CNN formulation. This information is then fed to a Hierarchical LongShort-Term Memory Network (HLSTM) that captures temporal dependencies between actions within and across shots. Ablation studies thoroughly validate the proposed approach, showing in particular that both levels of the HLSTM architecture contribute to performance improvement. Furthermore, quantitative comparisons show that the proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks,without relying on motion information
Address	January 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	MILAB;			Approved	no
Call Number	Admin @ si @ CRD2021			Serial	3626
Permanent link to this record