Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >>

Details

Records
Author	Nil Ballus; Bhalaji Nagarajan; Petia Radeva
Title	Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition			Type	Conference Article
Year	2022	Publication	10th Iberian Conference on Pattern Recognition and Image Analysis	Abbreviated Journal
Volume	13256	Issue		Pages
Keywords	Self-supervised; Contrastive learning; Food recognition
Abstract	Self-supervised Learning has been showing upbeat performance in several computer vision tasks. The popular contrastive methods make use of a Siamese architecture with different loss functions. In this work, we go deeper into two very recent state of the art frameworks, namely, SimSiam and Barlow Twins. Inspired by them, we propose a new self-supervised learning method we call Opt-SSL that combines both image and feature contrasting. We validate the proposed method on the food recognition task, showing that our proposed framework enables the self-learning networks to learn better visual representations.
Address	Aveiro; Portugal; May 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	IbPRIA
Notes	MILAB; no menciona			Approved	no
Call Number	Admin @ si @ BNR2022			Serial	3782
Permanent link to this record



Author	Yaxing Wang; Joost Van de Weijer; Lu Yu; Shangling Jui
Title	Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data			Type	Conference Article
Year	2022	Publication	10th International Conference on Learning Representations	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Conditional image synthesis is an integral part of many X2I translation systems, including image-to-image, text-to-image and audio-to-image translation systems. Training these large systems generally requires huge amounts of training data. Therefore, we investigate knowledge distillation to transfer knowledge from a high-quality unconditioned generative model (e.g., StyleGAN) to a conditioned synthetic image generation modules in a variety of systems. To initialize the conditional and reference branch (from a unconditional GAN) we exploit the style mixing characteristics of high-quality GANs to generate an infinite supply of style-mixed triplets to perform the knowledge distillation. Extensive experimental results in a number of image generation tasks (i.e., image-to-image, semantic segmentation-to-image, text-to-image and audio-to-image) demonstrate qualitatively and quantitatively that our method successfully transfers knowledge to the synthetic image generation modules, resulting in more realistic images than previous methods as confirmed by a significant drop in the FID.
Address	Virtual
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICLR
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ WWY2022			Serial	3791
Permanent link to this record



Author	Sergi Garcia Bordils; George Tom; Sangeeth Reddy; Minesh Mathew; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas
Title	Read While You Drive-Multilingual Text Tracking on the Road			Type	Conference Article
Year	2022	Publication	15th IAPR International workshop on document analysis systems	Abbreviated Journal
Volume	13237	Issue		Pages	756–770
Keywords
Abstract	Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild.
Address	La Rochelle; France; May 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-06554-5	Medium
Area		Expedition		Conference	DAS
Notes	DAG; 600.155; 611.022; 611.004			Approved	no
Call Number	Admin @ si @ GTR2022			Serial	3783
Permanent link to this record



Author	Patricia Suarez; Dario Carpio; Angel Sappa; Henry Velesaca
Title	Transformer based Image Dehazing			Type	Conference Article
Year	2022	Publication	16th IEEE International Conference on Signal Image Technology & Internet Based System	Abbreviated Journal
Volume		Issue		Pages
Keywords	atmospheric light; brightness component; computational cost; dehazing quality; haze-free image
Abstract	This paper presents a novel approach to remove non homogeneous haze from real images. The proposed method consists mainly of image feature extraction, haze removal, and image reconstruction. To accomplish this challenging task, we propose an architecture based on transformers, which have been recently introduced and have shown great potential in different computer vision tasks. Our model is based on the SwinIR an image restoration architecture based on a transformer, but by modifying the deep feature extraction module, the depth level of the model, and by applying a combined loss function that improves styling and adapts the model for the non-homogeneous haze removal present in images. The obtained results prove to be superior to those obtained by state-of-the-art models.
Address	Dijon; France; October 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	SITIS
Notes	MSIAU; no proj			Approved	no
Call Number	Admin @ si @ SCS2022			Serial	3803
Permanent link to this record



Author	Angel Sappa; Patricia Suarez; Henry Velesaca; Dario Carpio
Title	Domain Adaptation in Image Dehazing: Exploring the Usage of Images from Virtual Scenarios			Type	Conference Article
Year	2022	Publication	16th International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing	Abbreviated Journal
Volume		Issue		Pages	85-92
Keywords	Domain adaptation; Synthetic hazed dataset; Dehazing
Abstract	This work presents a novel domain adaptation strategy for deep learning-based approaches to solve the image dehazing problem. Firstly, a large set of synthetic images is generated by using a realistic 3D graphic simulator; these synthetic images contain different densities of haze, which are used for training the model that is later adapted to any real scenario. The adaptation process requires just a few images to fine-tune the model parameters. The proposed strategy allows overcoming the limitation of training a given model with few images. In other words, the proposed strategy implements the adaptation of a haze removal model trained with synthetic images to real scenarios. It should be noticed that it is quite difficult, if not impossible, to have large sets of pairs of real-world images (with and without haze) to train in a supervised way dehazing algorithms. Experimental results are provided showing the validity of the proposed domain adaptation strategy.
Address	Lisboa; Portugal; July 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CGVCVIP
Notes	MSIAU; no proj			Approved	no
Call Number	Admin @ si @ SSV2022			Serial	3804
Permanent link to this record



Author	Andrea Gemelli; Sanket Biswas; Enrico Civitelli; Josep Llados; Simone Marinai
Title	Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks			Type	Conference Article
Year	2022	Publication	17th European Conference on Computer Vision Workshops	Abbreviated Journal
Volume	13804	Issue		Pages	329–344
Keywords
Abstract	Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-25068-2	Medium
Area		Expedition		Conference	ECCV-TiE
Notes	DAG; 600.162; 600.140; 110.312			Approved	no
Call Number	Admin @ si @ GBC2022			Serial	3795
Permanent link to this record



Author	Jorge Charco; Angel Sappa; Boris X. Vintimilla
Title	Human Pose Estimation through a Novel Multi-view Scheme			Type	Conference Article
Year	2022	Publication	17th International Conference on Computer Vision Theory and Applications (VISAPP 2022)	Abbreviated Journal
Volume	5	Issue		Pages	855-862
Keywords	Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach
Abstract	This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations.
Address	On line; Feb 6, 2022 – Feb 8, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	2184-4321	ISBN	978-989-758-555-5	Medium
Area		Expedition		Conference	VISAPP
Notes	MSIAU; 600.160			Approved	no
Call Number	Admin @ si @ CSV2022			Serial	3689
Permanent link to this record



Author	Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla
Title	Multi-Image Super-Resolution for Thermal Images			Type	Conference Article
Year	2022	Publication	17th International Conference on Computer Vision Theory and Applications (VISAPP 2022)	Abbreviated Journal
Volume	4	Issue		Pages	635-642
Keywords	Thermal Images; Multi-view; Multi-frame; Super-Resolution; Deep Learning; Attention Block
Abstract	This paper proposes a novel CNN architecture for the multi-thermal image super-resolution problem. In the proposed scheme, the multi-images are synthetically generated by downsampling and slightly shifting the given image; noise is also added to each of these synthesized images. The proposed architecture uses two attention blocks paths to extract high-frequency details taking advantage of the large information extracted from multiple images of the same scene. Experimental results are provided, showing the proposed scheme has overcome the state-of-the-art approaches.
Address	Online; Feb 6-8, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISAPP
Notes	MSIAU; 601.349			Approved	no
Call Number	Admin @ si @ RSV2022a			Serial	3690
Permanent link to this record



Author	Bhalaji Nagarajan; Ricardo Marques; Marcos Mejia; Petia Radeva
Title	Class-conditional Importance Weighting for Deep Learning with Noisy Labels			Type	Conference Article
Year	2022	Publication	17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications	Abbreviated Journal
Volume	5	Issue		Pages	679-686
Keywords	Noisy Labeling; Loss Correction; Class-conditional Importance Weighting; Learning with Noisy Labels
Abstract	Large-scale accurate labels are very important to the Deep Neural Networks to train them and assure high performance. However, it is very expensive to create a clean dataset since usually it relies on human interaction. To this purpose, the labelling process is made cheap with a trade-off of having noisy labels. Learning with Noisy Labels is an active area of research being at the same time very challenging. The recent advances in Self-supervised learning and robust loss functions have helped in advancing noisy label research. In this paper, we propose a loss correction method that relies on dynamic weights computed based on the model training. We extend the existing Contrast to Divide algorithm coupled with DivideMix using a new class-conditional weighted scheme. We validate the method using the standard noise experiments and achieved encouraging results.
Address	Virtual; February 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISAPP
Notes	MILAB; no menciona			Approved	no
Call Number	Admin @ si @ NMM2022			Serial	3798
Permanent link to this record



Author	Patricia Suarez; Angel Sappa; Dario Carpio; Henry Velesaca; Francisca Burgos; Patricia Urdiales
Title	Deep Learning Based Shrimp Classification			Type	Conference Article
Year	2022	Publication	17th International Symposium on Visual Computing	Abbreviated Journal
Volume	13598	Issue		Pages	36–45
Keywords	Pigmentation; Color space; Light weight network
Abstract	This work proposes a novel approach based on deep learning to address the classification of shrimp (Pennaeus vannamei) into two classes, according to their level of pigmentation accepted by shrimp commerce. The main goal of this actual study is to support the shrimp industry in terms of price and process. An efficient CNN architecture is proposed to perform image classification through a program that could be set other in mobile devices or in fixed support in the shrimp supply chain. The proposed approach is a lightweight model that uses HSV color space shrimp images. A simple pipeline shows the most important stages performed to determine a pattern that identifies the class to which they belong based on their pigmentation. For the experiments, a database acquired with mobile devices of various brands and models has been used to capture images of shrimp. The results obtained with the images in the RGB and HSV color space allow for testing the effectiveness of the proposed model.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ISVC
Notes	MSIAU; no proj			Approved	no
Call Number	Admin @ si @ SAC2022			Serial	3772
Permanent link to this record



Author	Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal
Title	DocEnTr: An End-to-End Document Image Enhancement Transformer			Type	Conference Article
Year	2022	Publication	26th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	1699-1705
Keywords	Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
Abstract	Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Address	August 21-25, 2022 , Montréal Québec
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ SBJ2022			Serial	3730
Permanent link to this record



Author	Carlos Boned Riera; Oriol Ramos Terrades
Title	Discriminative Neural Variational Model for Unbalanced Classification Tasks in Knowledge Graph			Type	Conference Article
Year	2022	Publication	26th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	2186-2191
Keywords	Measurement; Couplings; Semantics; Ear; Benchmark testing; Data models; Pattern recognition
Abstract	Nowadays the paradigm of link discovery problems has shown significant improvements on Knowledge Graphs. However, method performances are harmed by the unbalanced nature of this classification problem, since many methods are easily biased to not find proper links. In this paper we present a discriminative neural variational auto-encoder model, called DNVAE from now on, in which we have introduced latent variables to serve as embedding vectors. As a result, the learnt generative model approximate better the underlying distribution and, at the same time, it better differentiate the type of relations in the knowledge graph. We have evaluated this approach on benchmark knowledge graph and Census records. Results in this last data set are quite impressive since we reach the highest possible score in the evaluation metrics. However, further experiments are still needed to deeper evaluate the performance of the method in more challenging tasks.
Address	Montreal; Quebec; Canada; August 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	DAG; 600.121; 600.162			Approved	no
Call Number	Admin @ si @ BoR2022			Serial	3741
Permanent link to this record



Author	Vacit Oguz Yazici; Joost Van de Weijer; Longlong Yu
Title	Visual Transformers with Primal Object Queries for Multi-Label Image Classification			Type	Conference Article
Year	2022	Publication	26th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique proposed for multi-label classification. The proposed transformer model with primal object queries improves the state-of-the-art class wise F1 metric by 2.1% and 1.8%; and speeds up the convergence by 79.0% and 38.6% on MS-COCO and NUS-WIDE datasets respectively.
Address	Montreal; Quebec; Canada; August 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	LAMP; 600.147; 601.309			Approved	no
Call Number	Admin @ si @ YWY2022			Serial	3786
Permanent link to this record



Author	Ayan Banerjee; Palaiahnakote Shivakumara; Parikshit Acharya; Umapada Pal; Josep Llados
Title	TWD: A New Deep E2E Model for Text Watermark Detection in Video Images			Type	Conference Article
Year	2022	Publication	26th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords	Deep learning; U-Net; FCENet; Scene text detection; Video text detection; Watermark text detection
Abstract	Text watermark detection in video images is challenging because text watermark characteristics are different from caption and scene texts in the video images. Developing a successful model for detecting text watermark, caption, and scene texts is an open challenge. This study aims at developing a new Deep End-to-End model for Text Watermark Detection (TWD), caption and scene text in video images. To standardize non-uniform contrast, quality, and resolution, we explore the U-Net3+ model for enhancing poor quality text without affecting high-quality text. Similarly, to address the challenges of arbitrary orientation, text shapes and complex background, we explore Stacked Hourglass Encoded Fourier Contour Embedding Network (SFCENet) by feeding the output of the U-Net3+ model as input. Furthermore, the proposed work integrates enhancement and detection models as an end-to-end model for detecting multi-type text in video images. To validate the proposed model, we create our own dataset (named TW-866), which provides video images containing text watermark, caption (subtitles), as well as scene text. The proposed model is also evaluated on standard natural scene text detection datasets, namely, ICDAR 2019 MLT, CTW1500, Total-Text, and DAST1500. The results show that the proposed method outperforms the existing methods. This is the first work on text watermark detection in video images to the best of our knowledge
Address	Montreal; Quebec; Canada; August 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	DAG;			Approved	no
Call Number	Admin @ si @ BSA2022			Serial	3788
Permanent link to this record



Author	Aitor Alvarez-Gila; Joost Van de Weijer; Yaxing Wang; Estibaliz Garrote
Title	MVMO: A Multi-Object Dataset for Wide Baseline Multi-View Semantic Segmentation			Type	Conference Article
Year	2022	Publication	29th IEEE International Conference on Image Processing	Abbreviated Journal
Volume		Issue		Pages
Keywords	multi-view; cross-view; semantic segmentation; synthetic dataset
Abstract	We present MVMO (Multi-View, Multi-Object dataset): a synthetic dataset of 116,000 scenes containing randomly placed objects of 10 distinct classes and captured from 25 camera locations in the upper hemisphere. MVMO comprises photorealistic, path-traced image renders, together with semantic segmentation ground truth for every view. Unlike existing multi-view datasets, MVMO features wide baselines between cameras and high density of objects, which lead to large disparities, heavy occlusions and view-dependent object appearance. Single view semantic segmentation is hindered by self and inter-object occlusions that could benefit from additional viewpoints. Therefore, we expect that MVMO will propel research in multi-view semantic segmentation and cross-view semantic transfer. We also provide baselines that show that new research is needed in such fields to exploit the complementary information of multi-view setups 1 .
Address	Bordeaux; France; October2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICIP
Notes	LAMP			Approved	no
Call Number	Admin @ si @ AWW2022			Serial	3781
Permanent link to this record