|   | 
Details
   web
Records
Author Nil Ballus; Bhalaji Nagarajan; Petia Radeva
Title Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition Type Conference Article
Year 2022 Publication 10th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal
Volume 13256 Issue Pages
Keywords (up) Self-supervised; Contrastive learning; Food recognition
Abstract Self-supervised Learning has been showing upbeat performance in several computer vision tasks. The popular contrastive methods make use of a Siamese architecture with different loss functions. In this work, we go deeper into two very recent state of the art frameworks, namely, SimSiam and Barlow Twins. Inspired by them, we propose a new self-supervised learning method we call Opt-SSL that combines both image and feature contrasting. We validate the proposed method on the food recognition task, showing that our proposed framework enables the self-learning networks to learn better visual representations.
Address Aveiro; Portugal; May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IbPRIA
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ BNR2022 Serial 3782
Permanent link to this record
 

 
Author Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network Type Journal Article
Year 2020 Publication Automation in Construction Abbreviated Journal AC
Volume 110 Issue Pages 102973
Keywords (up) Semantic image segmentation; Deep learning
Abstract Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ DMK2020 Serial 3314
Permanent link to this record
 

 
Author R. de Nijs; Sebastian Ramos; Gemma Roig; Xavier Boix; Luc Van Gool; K. Kühnlenz.
Title On-line Semantic Perception Using Uncertainty Type Conference Article
Year 2012 Publication International Conference on Intelligent Robots and Systems Abbreviated Journal IROS
Volume Issue Pages 4185-4191
Keywords (up) Semantic Segmentation
Abstract Visual perception capabilities are still highly unreliable in unconstrained settings, and solutions might not beaccurate in all regions of an image. Awareness of the uncertainty of perception is a fundamental requirement for proper high level decision making in a robotic system. Yet, the uncertainty measure is often sacrificed to account for dependencies between object/region classifiers. This is the case of Conditional Random Fields (CRFs), the success of which stems from their ability to infer the most likely world configuration, but they do not directly allow to estimate the uncertainty of the solution. In this paper, we consider the setting of assigning semantic labels to the pixels of an image sequence. Instead of using a CRF, we employ a Perturb-and-MAP Random Field, a recently introduced probabilistic model that allows performing fast approximate sampling from its probability density function. This allows to effectively compute the uncertainty of the solution, indicating the reliability of the most likely labeling in each region of the image. We report results on the CamVid dataset, a standard benchmark for semantic labeling of urban image sequences. In our experiments, we show the benefits of exploiting the uncertainty by putting more computational effort on the regions of the image that are less reliable, and use more efficient techniques for other regions, showing little decrease of performance
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IROS
Notes ADAS Approved no
Call Number ADAS @ adas @ NRR2012 Serial 2378
Permanent link to this record
 

 
Author Gemma Roig; Xavier Boix; R. de Nijs; Sebastian Ramos; K. Kühnlenz; Luc Van Gool
Title Active MAP Inference in CRFs for Efficient Semantic Segmentation Type Conference Article
Year 2013 Publication 15th IEEE International Conference on Computer Vision Abbreviated Journal
Volume Issue Pages 2312 - 2319
Keywords (up) Semantic Segmentation
Abstract Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
Address Sydney; Australia; December 2013
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1550-5499 ISBN Medium
Area Expedition Conference ICCV
Notes ADAS; 600.057 Approved no
Call Number ADAS @ adas @ RBN2013 Serial 2377
Permanent link to this record
 

 
Author Simon Jégou; Michal Drozdzal; David Vazquez; Adriana Romero; Yoshua Bengio
Title The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation Type Conference Article
Year 2017 Publication IEEE Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal
Volume Issue Pages
Keywords (up) Semantic Segmentation
Abstract State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.

Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.

In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.
Address Honolulu; USA; July 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes MILAB; ADAS; 600.076; 600.085; 601.281 Approved no
Call Number ADAS @ adas @ JDV2016 Serial 2866
Permanent link to this record
 

 
Author Mark Philip Philipsen; Jacob Velling Dueholm; Anders Jorgensen; Sergio Escalera; Thomas B. Moeslund
Title Organ Segmentation in Poultry Viscera Using RGB-D Type Journal Article
Year 2018 Publication Sensors Abbreviated Journal SENS
Volume 18 Issue 1 Pages 117
Keywords (up) semantic segmentation; RGB-D; random forest; conditional random field; 2D; 3D; CNN
Abstract We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps such as activation maps from a convolutional neural network (CNN). A random forest classifier assigns class probabilities, which are further refined by utilizing context in a conditional random field. The presented method is compatible with both 2D and 3D features, which allows us to explore the value of adding 3D and CNN-derived features. The dataset consists of 604 RGB-D images showing 151 unique sets of eviscerated viscera from four different perspectives. A mean Jaccard index of 78.11% is achieved across the four classes of organs by using features derived from 2D, 3D and a CNN, compared to 74.28% using only basic 2D image features.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ PVJ2018 Serial 3072
Permanent link to this record
 

 
Author Albert Ali Salah; E. Pauwels; R. Tavenard; Theo Gevers
Title T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data Type Journal Article
Year 2010 Publication Sensors Abbreviated Journal SENS
Volume 10 Issue 8 Pages 7496-7513
Keywords (up) sensor networks; temporal pattern extraction; T-patterns; Lempel-Ziv; Gaussian mixture model; MERL motion data
Abstract The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ALTRES;ISE Approved no
Call Number Admin @ si @ SPT2010 Serial 1845
Permanent link to this record
 

 
Author Pau Rodriguez; Diego Velazquez; Guillem Cucurull; Josep M. Gonfaus; Xavier Roca; Seiichi Ozawa; Jordi Gonzalez
Title Personality Trait Analysis in Social Networks Based on Weakly Supervised Learning of Shared Images Type Journal Article
Year 2020 Publication Applied Sciences Abbreviated Journal APPLSCI
Volume 10 Issue 22 Pages 8170
Keywords (up) sentiment analysis, personality trait analysis; weakly-supervised learning; visual classification; OCEAN model; social networks
Abstract Social networks have attracted the attention of psychologists, as the behavior of users can be used to assess personality traits, and to detect sentiments and critical mental situations such as depression or suicidal tendencies. Recently, the increasing amount of image uploads to social networks has shifted the focus from text to image-based personality assessment. However, obtaining the ground-truth requires giving personality questionnaires to the users, making the process very costly and slow, and hindering research on large populations. In this paper, we demonstrate that it is possible to predict which images are most associated with each personality trait of the OCEAN personality model, without requiring ground-truth personality labels. Namely, we present a weakly supervised framework which shows that the personality scores obtained using specific images textually associated with particular personality traits are highly correlated with scores obtained using standard text-based personality questionnaires. We trained an OCEAN trait model based on Convolutional Neural Networks (CNNs), learned from 120K pictures posted with specific textual hashtags, to infer whether the personality scores from the images uploaded by users are consistent with those scores obtained from text. In order to validate our claims, we performed a personality test on a heterogeneous group of 280 human subjects, showing that our model successfully predicts which kind of image will match a person with a given level of a trait. Looking at the results, we obtained evidence that personality is not only correlated with text, but with image content too. Interestingly, different visual patterns emerged from those images most liked by persons with a particular personality trait: for instance, pictures most associated with high conscientiousness usually contained healthy food, while low conscientiousness pictures contained injuries, guns, and alcohol. These findings could pave the way to complement text-based personality questionnaires with image-based questions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.119 Approved no
Call Number Admin @ si @ RVC2020b Serial 3553
Permanent link to this record
 

 
Author Bartlomiej Twardowski; Pawel Zawistowski; Szymon Zaborowski
Title Metric Learning for Session-Based Recommendations Type Conference Article
Year 2021 Publication 43rd edition of the annual BCS-IRSG European Conference on Information Retrieval Abbreviated Journal
Volume 12656 Issue Pages 650-665
Keywords (up) Session-based recommendations; Deep metric learning; Learning to rank
Abstract Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study.
Address Virtual; March 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECIR
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ TZZ2021 Serial 3586
Permanent link to this record
 

 
Author Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund
Title Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification Type Journal Article
Year 2022 Publication Automation in Construction Abbreviated Journal AC
Volume 144 Issue Pages 104614
Keywords (up) Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection
Abstract A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Address Dec 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA Approved no
Call Number Admin @ si @ BME2022c Serial 3780
Permanent link to this record
 

 
Author Egils Avots; Meysam Madadi; Sergio Escalera; Jordi Gonzalez; Xavier Baro; Paul Pallin; Gholamreza Anbarjafari
Title From 2D to 3D geodesic-based garment matching Type Journal Article
Year 2019 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 78 Issue 18 Pages 25829–25853
Keywords (up) Shape matching; Geodesic distance; Texture mapping; RGBD image processing; Gaussian mixture model
Abstract A new approach for 2D to 3D garment retexturing is proposed based on Gaussian mixture models and thin plate splines (TPS). An automatically segmented garment of an individual is matched to a new source garment and rendered, resulting in augmented images in which the target garment has been retextured using the texture of the source garment. We divide the problem into garment boundary matching based on Gaussian mixture models and then interpolate inner points using surface topology extracted through geodesic paths, which leads to a more realistic result than standard approaches. We evaluated and compared our system quantitatively by root mean square error (RMS) and qualitatively using the mean opinion score (MOS), showing the benefits of the proposed methodology on our gathered dataset.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; ISE; 600.098; 600.119; 602.133 Approved no
Call Number Admin @ si @ AME2019 Serial 3317
Permanent link to this record
 

 
Author Jon Almazan; Alicia Fornes; Ernest Valveny
Title A non-rigid appearance model for shape description and recognition Type Journal Article
Year 2012 Publication Pattern Recognition Abbreviated Journal PR
Volume 45 Issue 9 Pages 3105--3113
Keywords (up) Shape recognition; Deformable models; Shape modeling; Hand-drawn recognition
Abstract In this paper we describe a framework to learn a model of shape variability in a set of patterns. The framework is based on the Active Appearance Model (AAM) and permits to combine shape deformations with appearance variability. We have used two modifications of the Blurred Shape Model (BSM) descriptor as basic shape and appearance features to learn the model. These modifications permit to overcome the rigidity of the original BSM, adapting it to the deformations of the shape to be represented. We have applied this framework to representation and classification of handwritten digits and symbols. We show that results of the proposed methodology outperform the original BSM approach.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number DAG @ dag @ AFV2012 Serial 1982
Permanent link to this record
 

 
Author Gemma Roig; Xavier Boix; F. de la Torre; Joan Serrat; C. Vilella
Title Hierarchical CRF with product label spaces for parts-based Models Type Conference Article
Year 2011 Publication IEEE Conference on Automatic Face and Gesture Recognition Abbreviated Journal
Volume Issue Pages 657-664
Keywords (up) Shape; Computational modeling; Principal component analysis; Random variables; Color; Upper bound; Facial features
Abstract Non-rigid object detection is a challenging an open research problem in computer vision. It is a critical part in many applications such as image search, surveillance, human-computer interaction or image auto-annotation. Most successful approaches to non-rigid object detection make use of part-based models. In particular, Conditional Random Fields (CRF) have been successfully embedded into a discriminative parts-based model framework due to its effectiveness for learning and inference (usually based on a tree structure). However, CRF-based approaches do not incorporate global constraints and only model pairwise interactions. This is especially important when modeling object classes that may have complex parts interactions (e.g. facial features or body articulations), because neglecting them yields an oversimplified model with suboptimal performance. To overcome this limitation, this paper proposes a novel hierarchical CRF (HCRF). The main contribution is to build a hierarchy of part combinations by extending the label set to a hierarchy of product label spaces. In order to keep the inference computation tractable, we propose an effective method to reduce the new label set. We test our method on two applications: facial feature detection on the Multi-PIE database and human pose estimation on the Buffy dataset.
Address Santa Barbara, CA, USA, 2011
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference FG
Notes ADAS Approved no
Call Number Admin @ si @ RBT2011 Serial 1862
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Vassilis Athitsos; Mohammad Sabokrou
Title All You Need In Sign Language Production Type Miscellaneous
Year 2022 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords (up) Sign Language Production; Sign Language Recog- nition; Sign Language Translation; Deep Learning; Survey; Deaf
Abstract Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental.
To this end, sign language recognition and production are two necessary parts for making such a two-way system. Signlanguage recognition and production need to cope with some critical challenges. In this survey, we review recent advances in
Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ RKE2022c Serial 3698
Permanent link to this record
 

 
Author Aura Hernandez-Sabate; Lluis Albarracin; Daniel Calvo; Nuria Gorgorio
Title EyeMath: Identifying Mathematics Problem Solving Processes in a RTS Video Game Type Conference Article
Year 2016 Publication 5th International Conference Games and Learning Alliance Abbreviated Journal
Volume 10056 Issue Pages 50-59
Keywords (up) Simulation environment; Automated Driving; Driver-Vehicle interaction
Abstract Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference GALA
Notes ADAS;IAM; Approved no
Call Number HAC2016 Serial 2864
Permanent link to this record