Publicacions CVC -- Query Results

[11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40]

Details

Records
Author	Hamed H. Aghdam; Abel Gonzalez-Garcia; Joost Van de Weijer; Antonio Lopez
Title	Active Learning for Deep Detection Neural Networks			Type	Conference Article
Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	3672-3680
Keywords
Abstract	The cost of drawing object bounding boxes (ie labeling) for millions of images is prohibitively high. For instance, labeling pedestrians in a regular urban image could take 35 seconds on average. Active learning aims to reduce the cost of labeling by selecting only those images that are informative to improve the detection network accuracy. In this paper, we propose a method to perform active learning of object detectors based on convolutional neural networks. We propose a new image-level scoring process to rank unlabeled images for their automatic selection, which clearly outperforms classical scores. The proposed method can be applied to videos and sets of still images. In the former case, temporal selection rules can complement our scoring process. As a relevant use case, we extensively study the performance of our method on the task of pedestrian detection. Overall, the experiments show that the proposed method performs better than random selection.
Address	Seul; Korea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	ADAS; LAMP; 600.124; 600.109; 600.141; 600.120; 600.118			Approved	no
Call Number	Admin @ si @ AGW2019			Serial	3321
Permanent link to this record



Author	Felipe Codevilla; Eder Santana; Antonio Lopez; Adrien Gaidon
Title	Exploring the Limitations of Behavior Cloning for Autonomous Driving			Type	Conference Article
Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	9328-9337
Keywords
Abstract	Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. Explicitly modeling each possible scenario is unrealistic. In contrast, imitation learning can, in theory, leverage data from large fleets of human-driven cars. Behavior cloning in particular has been successfully used to learn simple visuomotor policies end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved problem. In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning. We show that behavior cloning leads to state-of-the-art results, executing complex lateral and longitudinal maneuvers, even in unseen environments, without being explicitly programmed to do so. However, we confirm some limitations of the behavior cloning approach: some well-known limitations (eg, dataset bias and overfitting), new generalization issues (eg, dynamic objects and the lack of a causal modeling), and training instabilities, all requiring further research before behavior cloning can graduate to real-world driving. The code, dataset, benchmark, and agent studied in this paper can be found at github.
Address	Seul; Korea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	ADAS; 600.124; 600.118			Approved	no
Call Number	Admin @ si @ CSL2019			Serial	3322
Permanent link to this record



Author	Reza Azad; Maryam Asadi Aghbolaghi; Mahmood Fathy; Sergio Escalera
Title	Bi-Directional ConvLSTM U-Net with Densley Connected Convolutions			Type	Conference Article
Year	2019	Publication	Visual Recognition for Medical Images workshop	Abbreviated Journal
Volume		Issue		Pages	406-415
Keywords
Abstract	In recent years, deep learning-based networks have achieved state-of-the-art performance in medical image segmentation. Among the existing networks, U-Net has been successfully applied on medical image segmentation. In this paper, we propose an extension of U-Net, Bi-directional ConvLSTM U-Net with Densely connected convolutions (BCDU-Net), for medical image segmentation, in which we take full advantages of U-Net, bi-directional ConvLSTM (BConvLSTM) and the mechanism of dense convolutions. Instead of a simple concatenation in the skip connection of U-Net, we employ BConvLSTM to combine the feature maps extracted from the corresponding encoding path and the previous decoding up-convolutional layer in a non-linear way. To strengthen feature propagation and encourage feature reuse, we use densely connected convolutions in the last convolutional layer of the encoding path. Finally, we can accelerate the convergence speed of the proposed network by employing batch normalization (BN). The proposed model is evaluated on three datasets of: retinal blood vessel segmentation, skin lesion segmentation, and lung nodule segmentation, achieving state-of-the-art performance.
Address	Seul; Korea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ AAF2019			Serial	3324
Permanent link to this record



Author	Mohammed Al Rawi; Ernest Valveny
Title	Compact and Efficient Multitask Learning in Vision, Language and Speech			Type	Conference Article
Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
Volume		Issue		Pages	2933-2942
Keywords
Abstract	Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.
Address	Seul; Korea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ RaV2019			Serial	3365
Permanent link to this record



Author	Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
Title	Seeing and Hearing Egocentric Actions: How Much Can We Learn?			Type	Conference Article
Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
Volume		Issue		Pages	4470-4480
Keywords
Abstract	Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.
Address	Seul; Korea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	MILAB; no proj			Approved	no
Call Number	Admin @ si @ CLR2019b			Serial	3385
Permanent link to this record



Author	Lichao Zhang; Abel Gonzalez-Garcia; Joost Van de Weijer; Martin Danelljan; Fahad Shahbaz Khan
Title	Learning the Model Update for Siamese Trackers			Type	Conference Article
Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	4009-4018
Keywords
Abstract	Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score.
Address	Seul; Corea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	LAMP; 600.109; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ ZGW2019			Serial	3295
Permanent link to this record



Author	Lichao Zhang; Martin Danelljan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan
Title	Multi-Modal Fusion for End-to-End RGB-T Tracking			Type	Conference Article
Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
Volume		Issue		Pages	2252-2261
Keywords
Abstract	We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.
Address	Seul; Corea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	LAMP; 600.109; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ ZDG2019			Serial	3279
Permanent link to this record



Author	Ali Furkan Biten; R. Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
Title	Scene Text Visual Question Answering			Type	Conference Article
Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	4291-4301
Keywords
Abstract	Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.
Address	Seul; Corea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	DAG; 600.129; 600.135; 601.338; 600.121			Approved	no
Call Number	Admin @ si @ BTM2019b			Serial	3285
Permanent link to this record



Author	Axel Barroso-Laguna; Edgar Riba; Daniel Ponsa; Krystian Mikolajczyk
Title	Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters			Type	Conference Article
Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	5835-5843
Keywords
Abstract	We introduce a novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture. Handcrafted filters provide anchor structures for learned filters, which localize, score and rank repeatable features. Scale-space representation is used within the network to extract keypoints at different levels. We design a loss function to detect robust features that exist across a range of scales and to maximize the repeatability score. Our Key.Net model is trained on data synthetically created from ImageNet and evaluated on HPatches benchmark. Results show that our approach outperforms state-of-the-art detectors in terms of repeatability, matching performance and complexity.
Address	Seul; Corea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	MSIAU; 600.122			Approved	no
Call Number	Admin @ si @ BRP2019			Serial	3290
Permanent link to this record



Author	Javad Zolfaghari Bengar; Abel Gonzalez-Garcia; Gabriel Villalonga; Bogdan Raducanu; Hamed H. Aghdam; Mikhail Mozerov; Antonio Lopez; Joost Van de Weijer
Title	Temporal Coherence for Active Learning in Videos			Type	Conference Article
Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
Volume		Issue		Pages	914-923
Keywords
Abstract	Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets.
Address	Seul; Corea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	LAMP; ADAS; 600.124; 602.200; 600.118; 600.120; 600.141			Approved	no
Call Number	Admin @ si @ ZGV2019			Serial	3294
Permanent link to this record



Author	David Berga; Xose R. Fernandez-Vidal; Xavier Otazu; Xose M. Pardo
Title	SID4VAM: A Benchmark Dataset with Synthetic Images for Visual Attention Modeling			Type	Conference Article
Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	8788-8797
Keywords
Abstract	A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size...) with a target-distractor popout type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-ofthe-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
Address	Seul; Corea; October 2019
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	NEUROBIT; 600.128			Approved	no
Call Number	Admin @ si @ BFO2019b			Serial	3372
Permanent link to this record



Author	A. Pujol; Antonio Lopez; Jose Luis Alba; Juan J. Villanueva
Title	Ridges, Valleys and Hausdorff Based Similarity Measures for Face Detection and Matching			Type	Miscellaneous
Year	2001	Publication	Proceedings of the 1st International Workshop on Pattern Recognition in Information Systems (PRIS’2001), ICEIS Press, Ana Fred and Anil K. Jain (Eds), pgs.80–90	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Setubal (Portugal)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	ADAS @ adas @ PLA2001			Serial	486
Permanent link to this record



Author	Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca
Title	Human Body Pose Estimation in Multi-view Environments			Type	Book Chapter
Year	2022	Publication	ICT Applications for Smart Cities. Intelligent Systems Reference Library	Abbreviated Journal
Volume	224	Issue		Pages	79-99
Keywords
Abstract	This chapter tackles the challenging problem of human pose estimation in multi-view environments to handle scenes with self-occlusions. The proposed approach starts by first estimating the camera pose—extrinsic parameters—in multi-view scenarios; due to few real image datasets, different virtual scenes are generated by using a special simulator, for training and testing the proposed convolutional neural network based approaches. Then, these extrinsic parameters are used to establish the relation between different cameras into the multi-view scheme, which captures the pose of the person from different points of view at the same time. The proposed multi-view scheme allows to robustly estimate human body joints’ position even in situations where they are occluded. This would help to avoid possible false alarms in behavioral analysis systems of smart cities, as well as applications for physical therapy, safe moving assistance for the elderly among other. The chapter concludes by presenting experimental results in real scenes by using state-of-the-art and the proposed multi-view approaches.
Address	September 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	ISRL
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06306-0	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ CSV2022b			Serial	3810
Permanent link to this record



Author	Henry Velesaca; Patricia Suarez; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez; Angel Morera
Title	Video Analytics in Urban Environments: Challenges and Approaches			Type	Book Chapter
Year	2022	Publication	ICT Applications for Smart Cities	Abbreviated Journal
Volume	224	Issue		Pages	101-121
Keywords
Abstract	This chapter reviews state-of-the-art approaches generally present in the pipeline of video analytics on urban scenarios. A typical pipeline is used to cluster approaches in the literature, including image preprocessing, object detection, object classification, and object tracking modules. Then, a review of recent approaches for each module is given. Additionally, applications and datasets generally used for training and evaluating the performance of these approaches are included. This chapter does not pretend to be an exhaustive review of state-of-the-art video analytics in urban environments but rather an illustration of some of the different recent contributions. The chapter concludes by presenting current trends in video analytics in the urban scenario field.
Address	September 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	ISRL
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06306-0	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ VSC2022			Serial	3811
Permanent link to this record



Author	Angel Sappa (ed)
Title	ICT Applications for Smart Cities			Type	Book Whole
Year	2022	Publication	ICT Applications for Smart Cities	Abbreviated Journal
Volume	224	Issue		Pages
Keywords	Computational Intelligence; Intelligent Systems; Smart Cities; ICT Applications; Machine Learning; Pattern Recognition; Computer Vision; Image Processing
Abstract	Part of the book series: Intelligent Systems Reference Library (ISRL) This book is the result of four-year work in the framework of the Ibero-American Research Network TICs4CI funded by the CYTED program. In the following decades, 85% of the world's population is expected to live in cities; hence, urban centers should be prepared to provide smart solutions for problems ranging from video surveillance and intelligent mobility to the solid waste recycling processes, just to mention a few. More specifically, the book describes underlying technologies and practical implementations of several successful case studies of ICTs developed in the following smart city areas: • Urban environment monitoring • Intelligent mobility • Waste recycling processes • Video surveillance • Computer-aided diagnose in healthcare systems • Computer vision-based approaches for efficiency in production processes The book is intended for researchers and engineers in the field of ICTs for smart cities, as well as to anyone who wants to know about state-of-the-art approaches and challenges on this field.
Address	September 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor	Angel Sappa
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	ISRL
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06306-0	Medium
Area		Expedition		Conference
Notes	MSIAU; MACO			Approved	no
Call Number	Admin @ si @ Sap2022			Serial	3812
Permanent link to this record