Publicacions CVC -- Query Results

[201–210] << 211 212 213 214 215 216 217 218 219 220 >> [221–228]

Details

Records
Author	Nil Ballus; Bhalaji Nagarajan; Petia Radeva
Title	Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition			Type	Conference Article
Year	2022	Publication	10th Iberian Conference on Pattern Recognition and Image Analysis	Abbreviated Journal
Volume	13256	Issue		Pages
Keywords	Self-supervised; Contrastive learning; Food recognition
Abstract	Self-supervised Learning has been showing upbeat performance in several computer vision tasks. The popular contrastive methods make use of a Siamese architecture with different loss functions. In this work, we go deeper into two very recent state of the art frameworks, namely, SimSiam and Barlow Twins. Inspired by them, we propose a new self-supervised learning method we call Opt-SSL that combines both image and feature contrasting. We validate the proposed method on the food recognition task, showing that our proposed framework enables the self-learning networks to learn better visual representations.
Address	Aveiro; Portugal; May 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	IbPRIA
Notes	MILAB; no menciona			Approved	no
Call Number	Admin @ si @ BNR2022			Serial	3782
Permanent link to this record



Author	Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title	Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network			Type	Journal Article
Year	2020	Publication	Automation in Construction	Abbreviated Journal	AC
Volume	110	Issue		Pages	102973
Keywords	Semantic image segmentation; Deep learning
Abstract	Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ DMK2020			Serial	3314
Permanent link to this record



Author	R. de Nijs; Sebastian Ramos; Gemma Roig; Xavier Boix; Luc Van Gool; K. Kühnlenz.
Title	On-line Semantic Perception Using Uncertainty			Type	Conference Article
Year	2012	Publication	International Conference on Intelligent Robots and Systems	Abbreviated Journal	IROS
Volume		Issue		Pages	4185-4191
Keywords	Semantic Segmentation
Abstract	Visual perception capabilities are still highly unreliable in unconstrained settings, and solutions might not beaccurate in all regions of an image. Awareness of the uncertainty of perception is a fundamental requirement for proper high level decision making in a robotic system. Yet, the uncertainty measure is often sacrificed to account for dependencies between object/region classifiers. This is the case of Conditional Random Fields (CRFs), the success of which stems from their ability to infer the most likely world configuration, but they do not directly allow to estimate the uncertainty of the solution. In this paper, we consider the setting of assigning semantic labels to the pixels of an image sequence. Instead of using a CRF, we employ a Perturb-and-MAP Random Field, a recently introduced probabilistic model that allows performing fast approximate sampling from its probability density function. This allows to effectively compute the uncertainty of the solution, indicating the reliability of the most likely labeling in each region of the image. We report results on the CamVid dataset, a standard benchmark for semantic labeling of urban image sequences. In our experiments, we show the benefits of exploiting the uncertainty by putting more computational effort on the regions of the image that are less reliable, and use more efficient techniques for other regions, showing little decrease of performance
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	IROS
Notes	ADAS			Approved	no
Call Number	ADAS @ adas @ NRR2012			Serial	2378
Permanent link to this record



Author	Gemma Roig; Xavier Boix; R. de Nijs; Sebastian Ramos; K. Kühnlenz; Luc Van Gool
Title	Active MAP Inference in CRFs for Efficient Semantic Segmentation			Type	Conference Article
Year	2013	Publication	15th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	2312 - 2319
Keywords	Semantic Segmentation
Abstract	Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
Address	Sydney; Australia; December 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	ADAS; 600.057			Approved	no
Call Number	ADAS @ adas @ RBN2013			Serial	2377
Permanent link to this record



Author	Simon Jégou; Michal Drozdzal; David Vazquez; Adriana Romero; Yoshua Bengio
Title	The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation			Type	Conference Article
Year	2017	Publication	IEEE Conference on Computer Vision and Pattern Recognition Workshops	Abbreviated Journal
Volume		Issue		Pages
Keywords	Semantic Segmentation
Abstract	State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions. Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.
Address	Honolulu; USA; July 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	MILAB; ADAS; 600.076; 600.085; 601.281			Approved	no
Call Number	ADAS @ adas @ JDV2016			Serial	2866
Permanent link to this record



Author	Mark Philip Philipsen; Jacob Velling Dueholm; Anders Jorgensen; Sergio Escalera; Thomas B. Moeslund
Title	Organ Segmentation in Poultry Viscera Using RGB-D			Type	Journal Article
Year	2018	Publication	Sensors	Abbreviated Journal	SENS
Volume	18	Issue	1	Pages	117
Keywords	semantic segmentation; RGB-D; random forest; conditional random field; 2D; 3D; CNN
Abstract	We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps such as activation maps from a convolutional neural network (CNN). A random forest classifier assigns class probabilities, which are further refined by utilizing context in a conditional random field. The presented method is compatible with both 2D and 3D features, which allows us to explore the value of adding 3D and CNN-derived features. The dataset consists of 604 RGB-D images showing 151 unique sets of eviscerated viscera from four different perspectives. A mean Jaccard index of 78.11% is achieved across the four classes of organs by using features derived from 2D, 3D and a CNN, compared to 74.28% using only basic 2D image features.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ PVJ2018			Serial	3072
Permanent link to this record



Author	Albert Ali Salah; E. Pauwels; R. Tavenard; Theo Gevers
Title	T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data			Type	Journal Article
Year	2010	Publication	Sensors	Abbreviated Journal	SENS
Volume	10	Issue	8	Pages	7496-7513
Keywords	sensor networks; temporal pattern extraction; T-patterns; Lempel-Ziv; Gaussian mixture model; MERL motion data
Abstract	The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ALTRES;ISE			Approved	no
Call Number	Admin @ si @ SPT2010			Serial	1845
Permanent link to this record



Author	Pau Rodriguez; Diego Velazquez; Guillem Cucurull; Josep M. Gonfaus; Xavier Roca; Seiichi Ozawa; Jordi Gonzalez
Title	Personality Trait Analysis in Social Networks Based on Weakly Supervised Learning of Shared Images			Type	Journal Article
Year	2020	Publication	Applied Sciences	Abbreviated Journal	APPLSCI
Volume	10	Issue	22	Pages	8170
Keywords	sentiment analysis, personality trait analysis; weakly-supervised learning; visual classification; OCEAN model; social networks
Abstract	Social networks have attracted the attention of psychologists, as the behavior of users can be used to assess personality traits, and to detect sentiments and critical mental situations such as depression or suicidal tendencies. Recently, the increasing amount of image uploads to social networks has shifted the focus from text to image-based personality assessment. However, obtaining the ground-truth requires giving personality questionnaires to the users, making the process very costly and slow, and hindering research on large populations. In this paper, we demonstrate that it is possible to predict which images are most associated with each personality trait of the OCEAN personality model, without requiring ground-truth personality labels. Namely, we present a weakly supervised framework which shows that the personality scores obtained using specific images textually associated with particular personality traits are highly correlated with scores obtained using standard text-based personality questionnaires. We trained an OCEAN trait model based on Convolutional Neural Networks (CNNs), learned from 120K pictures posted with specific textual hashtags, to infer whether the personality scores from the images uploaded by users are consistent with those scores obtained from text. In order to validate our claims, we performed a personality test on a heterogeneous group of 280 human subjects, showing that our model successfully predicts which kind of image will match a person with a given level of a trait. Looking at the results, we obtained evidence that personality is not only correlated with text, but with image content too. Interestingly, different visual patterns emerged from those images most liked by persons with a particular personality trait: for instance, pictures most associated with high conscientiousness usually contained healthy food, while low conscientiousness pictures contained injuries, guns, and alcohol. These findings could pave the way to complement text-based personality questionnaires with image-based questions.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE; 600.119			Approved	no
Call Number	Admin @ si @ RVC2020b			Serial	3553
Permanent link to this record



Author	Bartlomiej Twardowski; Pawel Zawistowski; Szymon Zaborowski
Title	Metric Learning for Session-Based Recommendations			Type	Conference Article
Year	2021	Publication	43rd edition of the annual BCS-IRSG European Conference on Information Retrieval	Abbreviated Journal
Volume	12656	Issue		Pages	650-665
Keywords	Session-based recommendations; Deep metric learning; Learning to rank
Abstract	Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study.
Address	Virtual; March 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECIR
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ TZZ2021			Serial	3586
Permanent link to this record



Author	Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund
Title	Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification			Type	Journal Article
Year	2022	Publication	Automation in Construction	Abbreviated Journal	AC
Volume	144	Issue		Pages	104614
Keywords	Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection
Abstract	A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Address	Dec 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA			Approved	no
Call Number	Admin @ si @ BME2022c			Serial	3780
Permanent link to this record



Author	Egils Avots; Meysam Madadi; Sergio Escalera; Jordi Gonzalez; Xavier Baro; Paul Pallin; Gholamreza Anbarjafari
Title	From 2D to 3D geodesic-based garment matching			Type	Journal Article
Year	2019	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
Volume	78	Issue	18	Pages	25829–25853
Keywords	Shape matching; Geodesic distance; Texture mapping; RGBD image processing; Gaussian mixture model
Abstract	A new approach for 2D to 3D garment retexturing is proposed based on Gaussian mixture models and thin plate splines (TPS). An automatically segmented garment of an individual is matched to a new source garment and rendered, resulting in augmented images in which the target garment has been retextured using the texture of the source garment. We divide the problem into garment boundary matching based on Gaussian mixture models and then interpolate inner points using surface topology extracted through geodesic paths, which leads to a more realistic result than standard approaches. We evaluated and compared our system quantitatively by root mean square error (RMS) and qualitatively using the mean opinion score (MOS), showing the benefits of the proposed methodology on our gathered dataset.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; ISE; 600.098; 600.119; 602.133			Approved	no
Call Number	Admin @ si @ AME2019			Serial	3317
Permanent link to this record



Author	Jon Almazan; Alicia Fornes; Ernest Valveny
Title	A non-rigid appearance model for shape description and recognition			Type	Journal Article
Year	2012	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	45	Issue	9	Pages	3105--3113
Keywords	Shape recognition; Deformable models; Shape modeling; Hand-drawn recognition
Abstract	In this paper we describe a framework to learn a model of shape variability in a set of patterns. The framework is based on the Active Appearance Model (AAM) and permits to combine shape deformations with appearance variability. We have used two modifications of the Blurred Shape Model (BSM) descriptor as basic shape and appearance features to learn the model. These modifications permit to overcome the rigidity of the original BSM, adapting it to the deformations of the shape to be represented. We have applied this framework to representation and classification of handwritten digits and symbols. We show that results of the proposed methodology outperform the original BSM approach.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0031-3203	ISBN		Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	DAG @ dag @ AFV2012			Serial	1982
Permanent link to this record



Author	Gemma Roig; Xavier Boix; F. de la Torre; Joan Serrat; C. Vilella
Title	Hierarchical CRF with product label spaces for parts-based Models			Type	Conference Article
Year	2011	Publication	IEEE Conference on Automatic Face and Gesture Recognition	Abbreviated Journal
Volume		Issue		Pages	657-664
Keywords	Shape; Computational modeling; Principal component analysis; Random variables; Color; Upper bound; Facial features
Abstract	Non-rigid object detection is a challenging an open research problem in computer vision. It is a critical part in many applications such as image search, surveillance, human-computer interaction or image auto-annotation. Most successful approaches to non-rigid object detection make use of part-based models. In particular, Conditional Random Fields (CRF) have been successfully embedded into a discriminative parts-based model framework due to its effectiveness for learning and inference (usually based on a tree structure). However, CRF-based approaches do not incorporate global constraints and only model pairwise interactions. This is especially important when modeling object classes that may have complex parts interactions (e.g. facial features or body articulations), because neglecting them yields an oversimplified model with suboptimal performance. To overcome this limitation, this paper proposes a novel hierarchical CRF (HCRF). The main contribution is to build a hierarchy of part combinations by extending the label set to a hierarchy of product label spaces. In order to keep the inference computation tractable, we propose an effective method to reduce the new label set. We test our method on two applications: facial feature detection on the Multi-PIE database and human pose estimation on the Buffy dataset.
Address	Santa Barbara, CA, USA, 2011
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	FG
Notes	ADAS			Approved	no
Call Number	Admin @ si @ RBT2011			Serial	1862
Permanent link to this record



Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Vassilis Athitsos; Mohammad Sabokrou
Title	All You Need In Sign Language Production			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords	Sign Language Production; Sign Language Recog- nition; Sign Language Translation; Deep Learning; Survey; Deaf
Abstract	Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Signlanguage recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ RKE2022c			Serial	3698
Permanent link to this record



Author	Aura Hernandez-Sabate; Lluis Albarracin; Daniel Calvo; Nuria Gorgorio
Title	EyeMath: Identifying Mathematics Problem Solving Processes in a RTS Video Game			Type	Conference Article
Year	2016	Publication	5th International Conference Games and Learning Alliance	Abbreviated Journal
Volume	10056	Issue		Pages	50-59
Keywords	Simulation environment; Automated Driving; Driver-Vehicle interaction
Abstract	Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	GALA
Notes	ADAS;IAM;			Approved	no
Call Number	HAC2016			Serial	2864
Permanent link to this record