Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	2506–2520 of 3413 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[151–160] << 161 162 163 164 165 166 167 168 169 170 >> [171–180]

List View

Citations

Details

	Records
	Author	Patricia Suarez; Dario Carpio; Angel Sappa
	Title	Non-homogeneous Haze Removal Through a Multiple Attention Module Architecture			Type	Conference Article
	Year	2021	Publication	16th International Symposium on Visual Computing	Abbreviated Journal
	Volume	13018	Issue		Pages	178–190
	Keywords
	Abstract	This paper presents a novel attention based architecture to remove non-homogeneous haze. The proposed model is focused on obtaining the most representative characteristics of the image, at each learning cycle, by means of adaptive attention modules coupled with a residual learning convolutional network. The latter is based on the Res2Net model. The proposed architecture is trained with just a few set of images. Its performance is evaluated on a public benchmark—images from the non-homogeneous haze NTIRE 2021 challenge—and compared with state of the art approaches reaching the best result.
	Address	Virtual; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ISVC
	Notes	MSIAU			Approved	no
	Call Number	Admin @ si @ SCS2021			Serial	3668
Permanent link to this record



	Author	F.Negin; Pau Rodriguez; M.Koperski; A.Kerboua; Jordi Gonzalez; J.Bourgeois; E.Chapoulie; P.Robert; F.Bremond
	Title	PRAXIS: Towards automatic cognitive assessment using gesture recognition			Type	Journal Article
	Year	2018	Publication	Expert Systems with Applications	Abbreviated Journal	ESWA
	Volume	106	Issue		Pages	21-35
	Keywords
	Abstract	Praxis test is a gesture-based diagnostic test which has been accepted as diagnostically indicative of cortical pathologies such as Alzheimer’s disease. Despite being simple, this test is oftentimes skipped by the clinicians. In this paper, we propose a novel framework to investigate the potential of static and dynamic upper-body gestures based on the Praxis test and their potential in a medical framework to automatize the test procedures for computer-assisted cognitive assessment of older adults. In order to carry out gesture recognition as well as correctness assessment of the performances we have recollected a novel challenging RGB-D gesture video dataset recorded by Kinect v2, which contains 29 specific gestures suggested by clinicians and recorded from both experts and patients performing the gesture set. Moreover, we propose a framework to learn the dynamics of upper-body gestures, considering the videos as sequences of short-term clips of gestures. Our approach first uses body part detection to extract image patches surrounding the hands and then, by means of a fine-tuned convolutional neural network (CNN) model, it learns deep hand features which are then linked to a long short-term memory to capture the temporal dependencies between video frames. We report the results of four developed methods using different modalities. The experiments show effectiveness of our deep learning based approach in gesture recognition and performance assessment tasks. Satisfaction of clinicians from the assessment reports indicates the impact of framework corresponding to the diagnosis.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ NRK2018			Serial	3669
Permanent link to this record



	Author	Javad Zolfaghari Bengar; Joost Van de Weijer; Bartlomiej Twardowski; Bogdan Raducanu
	Title	Reducing Label Effort: Self- Supervised Meets Active Learning			Type	Conference Article
	Year	2021	Publication	International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	1631-1639
	Keywords
	Abstract	Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets. The current work focuses on whether the two paradigms can benefit from each other. We studied object recognition datasets including CIFAR10, CIFAR100 and Tiny ImageNet with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high. The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.
	Address	October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP;			Approved	no
	Call Number	Admin @ si @ ZVT2021			Serial	3672
Permanent link to this record



	Author	Pau Riba; Sounak Dey; Ali Furkan Biten; Josep Llados
	Title	Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild			Type	Miscellaneous
	Year	2021	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the performance of SGOL approaches based on available simple yet powerful methods. We advance prior arts by proposing a sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a hard classification and alleviates the domain gap between sketches and images to localize object instances. Although the main goal of SGOL is focused on object detection, we explored its natural extension to sketch-guided instance segmentation. This novel task allows to move towards identifying the objects at pixel level, which is of key importance in several applications. We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results. All training and testing code of our model will be released to facilitate future researchhttps://github.com/priba/sgol_wild.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ RDB2021			Serial	3674
Permanent link to this record



	Author	Albert Suso; Pau Riba; Oriol Ramos Terrades; Josep Llados
	Title	A Self-supervised Inverse Graphics Approach for Sketch Parametrization			Type	Conference Article
	Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume	12916	Issue		Pages	28-42
	Keywords
	Abstract	The study of neural generative models of handwritten text and human sketches is a hot topic in the computer vision field. The landmark SketchRNN provided a breakthrough by sequentially generating sketches as a sequence of waypoints, and more recent articles have managed to generate fully vector sketches by coding the strokes as Bézier curves. However, the previous attempts with this approach need them all a ground truth consisting in the sequence of points that make up each stroke, which seriously limits the datasets the model is able to train in. In this work, we present a self-supervised end-to-end inverse graphics approach that learns to embed each image to its best fit of Bézier curves. The self-supervised nature of the training process allows us to train the model in a wider range of datasets, but also to perform better after-training predictions by applying an overfitting process on the input binary image. We report qualitative an quantitative evaluations on the MNIST and the Quick, Draw! datasets.
	Address	Lausanne; Suissa; September 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ SRR2021			Serial	3675
Permanent link to this record



	Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
	Title	Graph-Based Deep Generative Modelling for Document Layout Generation			Type	Conference Article
	Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume	12917	Issue		Pages	525-537
	Keywords
	Abstract	One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.
	Address	Lausanne; Suissa; September 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121; 600.140; 110.312			Approved	no
	Call Number	Admin @ si @ BRL2021			Serial	3676
Permanent link to this record



	Author	Josep Llados
	Title	The 5G of Document Intelligence			Type	Conference Article
	Year	2021	Publication	3rd Workshop on Future of Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address	Lausanne; Suissa; September 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	FDAR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @			Serial	3677
Permanent link to this record



	Author	Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal
	Title	DocEnTr: An End-to-End Document Image Enhancement Transformer			Type	Conference Article
	Year	2022	Publication	26th International Conference on Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	1699-1705
	Keywords	Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
	Abstract	Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
	Address	August 21-25, 2022 , Montréal Québec
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPR
	Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
	Call Number	Admin @ si @ SBJ2022			Serial	3730
Permanent link to this record



	Author	Fei Yang; Yaxing Wang; Luis Herranz; Yongmei Cheng; Mikhail Mozerov
	Title	A Novel Framework for Image-to-image Translation and Image Compression			Type	Journal Article
	Year	2022	Publication	Neurocomputing	Abbreviated Journal	NEUCOM
	Volume	508	Issue		Pages	58-70
	Keywords
	Abstract	Data-driven paradigms using machine learning are becoming ubiquitous in image processing and communications. In particular, image-to-image (I2I) translation is a generic and widely used approach to image processing problems, such as image synthesis, style transfer, and image restoration. At the same time, neural image compression has emerged as a data-driven alternative to traditional coding approaches in visual communications. In this paper, we study the combination of these two paradigms into a joint I2I compression and translation framework, focusing on multi-domain image synthesis. We first propose distributed I2I translation by integrating quantization and entropy coding into an I2I translation framework (i.e. I2Icodec). In practice, the image compression functionality (i.e. autoencoding) is also desirable, requiring to deploy alongside I2Icodec a regular image codec. Thus, we further propose a unified framework that allows both translation and autoencoding capabilities in a single codec. Adaptive residual blocks conditioned on the translation/compression mode provide flexible adaptation to the desired functionality. The experiments show promising results in both I2I translation and image compression using a single model.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ YWH2022			Serial	3679
Permanent link to this record



	Author	Shun Yao; Fei Yang; Yongmei Cheng; Mikhail Mozerov
	Title	3D Shapes Local Geometry Codes Learning with SDF			Type	Conference Article
	Year	2021	Publication	International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2110-2117
	Keywords
	Abstract	A signed distance function (SDF) as the 3D shape description is one of the most effective approaches to represent 3D geometry for rendering and reconstruction. Our work is inspired by the state-of-the-art method DeepSDF [17] that learns and analyzes the 3D shape as the iso-surface of its shell and this method has shown promising results especially in the 3D shape reconstruction and compression domain. In this paper, we consider the degeneration problem of reconstruction coming from the capacity decrease of the DeepSDF model, which approximates the SDF with a neural network and a single latent code. We propose Local Geometry Code Learning (LGCL), a model that improves the original DeepSDF results by learning from a local shape geometry of the full 3D shape. We add an extra graph neural network to split the single transmittable latent code into a set of local latent codes distributed on the 3D shape. Mentioned latent codes are used to approximate the SDF in their local regions, which will alleviate the complexity of the approximation compared to the original DeepSDF. Furthermore, we introduce a new geometric loss function to facilitate the training of these local latent codes. Note that other local shape adjusting methods use the 3D voxel representation, which in turn is a problem highly difficult to solve or even is insolvable. In contrast, our architecture is based on graph processing implicitly and performs the learning regression process directly in the latent code space, thus make the proposed architecture more flexible and also simple for realization. Our experiments on 3D shape reconstruction demonstrate that our LGCL method can keep more details with a significantly smaller size of the SDF decoder and outperforms considerably the original DeepSDF method under the most important quantitative metrics.
	Address	VIRTUAL; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ YYC2021			Serial	3681
Permanent link to this record



	Author	Yasuko Sugito; Javier Vazquez; Trevor Canham; Marcelo Bertalmio
	Title	Image quality evaluation in professional HDR/WCG production questions the need for HDR metrics			Type	Journal Article
	Year	2022	Publication	IEEE Transactions on Image Processing	Abbreviated Journal	TIP
	Volume	31	Issue		Pages	5163 - 5177
	Keywords	Measurement; Image color analysis; Image coding; Production; Dynamic range; Brightness; Extraterrestrial measurements
	Abstract	In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	600.161; 611.007			Approved	no
	Call Number	Admin @ si @ SVG2022			Serial	3683
Permanent link to this record



	Author	Kai Wang; Xialei Liu; Andrew Bagdanov; Luis Herranz; Shangling Jui; Joost Van de Weijer
	Title	Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition			Type	Conference Article
	Year	2022	Publication	CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition)	Abbreviated Journal
	Volume		Issue		Pages	3728-3738
	Keywords	Training; Computer vision; Image recognition; Upper bound; Conferences; Pattern recognition; Task analysis
	Abstract	In this paper we consider the problem of incremental meta-learning in which classes are presented incrementally in discrete tasks. We propose Episodic Replay Distillation (ERD), that mixes classes from the current task with exemplars from previous tasks when sampling episodes for meta-learning. To allow the training to benefit from a large as possible variety of classes, which leads to more gener- alizable feature representations, we propose the cross-task meta loss. Furthermore, we propose episodic replay distillation that also exploits exemplars for improved knowledge distillation. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence scenarios, we reduce the gap between Incremental Meta-Learning and the joint-training upper bound from 3.5% / 10.1% / 13.4% / 11.7% with the current state-of-the-art to 2.6% / 2.9% / 5.0% / 0.2% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100 / CUB, respectively.
	Address	New Orleans, USA; 20 June 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPRW
	Notes	LAMP; 600.147			Approved	no
	Call Number	Admin @ si @ WLB2022			Serial	3686
Permanent link to this record



	Author	Zhaocheng Liu; Luis Herranz; Fei Yang; Saiping Zhang; Shuai Wan; Marta Mrak; Marc Gorriz
	Title	Slimmable Video Codec			Type	Conference Article
	Year	2022	Publication	CVPR 2022 Workshop and Challenge on Learned Image Compression (CLIC 2022, 5th Edition)	Abbreviated Journal
	Volume		Issue		Pages	1742-1746
	Keywords
	Abstract	Neural video compression has emerged as a novel paradigm combining trainable multilayer neural net-works and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.
	Address	Virtual; 19 June 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPRW
	Notes	MACO; 601.379; 601.161			Approved	no
	Call Number	Admin @ si @ LHY2022			Serial	3687
Permanent link to this record



	Author	Jorge Charco; Angel Sappa; Boris X. Vintimilla
	Title	Human Pose Estimation through a Novel Multi-view Scheme			Type	Conference Article
	Year	2022	Publication	17th International Conference on Computer Vision Theory and Applications (VISAPP 2022)	Abbreviated Journal
	Volume	5	Issue		Pages	855-862
	Keywords	Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach
	Abstract	This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations.
	Address	On line; Feb 6, 2022 – Feb 8, 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	2184-4321	ISBN	978-989-758-555-5	Medium
	Area		Expedition		Conference	VISAPP
	Notes	MSIAU; 600.160			Approved	no
	Call Number	Admin @ si @ CSV2022			Serial	3689
Permanent link to this record



	Author	Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla
	Title	Multi-Image Super-Resolution for Thermal Images			Type	Conference Article
	Year	2022	Publication	17th International Conference on Computer Vision Theory and Applications (VISAPP 2022)	Abbreviated Journal
	Volume	4	Issue		Pages	635-642
	Keywords	Thermal Images; Multi-view; Multi-frame; Super-Resolution; Deep Learning; Attention Block
	Abstract	This paper proposes a novel CNN architecture for the multi-thermal image super-resolution problem. In the proposed scheme, the multi-images are synthetically generated by downsampling and slightly shifting the given image; noise is also added to each of these synthesized images. The proposed architecture uses two attention blocks paths to extract high-frequency details taking advantage of the large information extracted from multiple images of the same scene. Experimental results are provided, showing the proposed scheme has overcome the state-of-the-art approaches.
	Address	Online; Feb 6-8, 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	VISAPP
	Notes	MSIAU; 601.349			Approved	no
	Call Number	Admin @ si @ RSV2022a			Serial	3690
Permanent link to this record