|   | 
Details
   web
Records
Author Simon Jégou; Michal Drozdzal; David Vazquez; Adriana Romero; Yoshua Bengio
Title The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation Type Conference Article
Year 2017 Publication IEEE Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal
Volume Issue Pages
Keywords Semantic Segmentation
Abstract (up) State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.

Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.

In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.
Address Honolulu; USA; July 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes MILAB; ADAS; 600.076; 600.085; 601.281 Approved no
Call Number ADAS @ adas @ JDV2016 Serial 2866
Permanent link to this record
 

 
Author Meysam Madadi; Sergio Escalera; Alex Carruesco; Carlos Andujar; Xavier Baro; Jordi Gonzalez
Title Occlusion Aware Hand Pose Recovery from Sequences of Depth Images Type Conference Article
Year 2017 Publication 12th IEEE International Conference on Automatic Face and Gesture Recognition Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) State-of-the-art approaches on hand pose estimation from depth images have reported promising results under quite controlled considerations. In this paper we propose a two-step pipeline for recovering the hand pose from a sequence of depth images. The pipeline has been designed to deal with images taken from any viewpoint and exhibiting a high degree of finger occlusion. In a first step we initialize the hand pose using a part-based model, fitting a set of hand components in the depth images. In a second step we consider temporal data and estimate the parameters of a trained bilinear model consisting of shape and trajectory bases. Results on a synthetic, highly-occluded dataset demonstrate that the proposed method outperforms most recent pose recovering approaches, including those based on CNNs.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference FG
Notes HUPBA; ISE; 602.143; 600.098; 600.119 Approved no
Call Number Admin @ si @ MEC2017 Serial 2970
Permanent link to this record
 

 
Author Meysam Madadi; Sergio Escalera; Alex Carruesco Llorens; Carlos Andujar; Xavier Baro; Jordi Gonzalez
Title Top-down model fitting for hand pose recovery in sequences of depth images Type Journal Article
Year 2018 Publication Image and Vision Computing Abbreviated Journal IMAVIS
Volume 79 Issue Pages 63-75
Keywords
Abstract (up) State-of-the-art approaches on hand pose estimation from depth images have reported promising results under quite controlled considerations. In this paper we propose a two-step pipeline for recovering the hand pose from a sequence of depth images. The pipeline has been designed to deal with images taken from any viewpoint and exhibiting a high degree of finger occlusion. In a first step we initialize the hand pose using a part-based model, fitting a set of hand components in the depth images. In a second step we consider temporal data and estimate the parameters of a trained bilinear model consisting of shape and trajectory bases. We evaluate our approach on a new created synthetic hand dataset along with NYU and MSRA real datasets. Results demonstrate that the proposed method outperforms the most recent pose recovering approaches, including those based on CNNs.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; 600.098 Approved no
Call Number Admin @ si @ MEC2018 Serial 3203
Permanent link to this record
 

 
Author Jiaolong Xu; David Vazquez; Antonio Lopez; Javier Marin; Daniel Ponsa
Title Learning a Multiview Part-based Model in Virtual World for Pedestrian Detection Type Conference Article
Year 2013 Publication IEEE Intelligent Vehicles Symposium Abbreviated Journal
Volume Issue Pages 467 - 472
Keywords Pedestrian Detection; Virtual World; Part based
Abstract (up) State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with automatically generated part examples from virtual-world data. The method is efficient as: (i) the part detectors are trained with precisely extracted virtual examples, thus no latent learning is needed, (ii) the multiview pedestrian detector enhances the performance of the pedestrian root model, (iii) a top-down approach is used for part detection which reduces the searching space. We evaluate our model on Daimler and Karlsruhe Pedestrian Benchmarks with publicly available Caltech pedestrian detection evaluation framework and the result outperforms the state-of-the-art latent SVM V4.0, on both average miss rate and speed (our detector is ten times faster).
Address Gold Coast; Australia; June 2013
Corporate Author Thesis
Publisher IEEE Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1931-0587 ISBN 978-1-4673-2754-1 Medium
Area Expedition Conference IV
Notes ADAS; 600.054; 600.057 Approved no
Call Number XVL2013; ADAS @ adas @ xvl2013a Serial 2214
Permanent link to this record
 

 
Author Volkmar Frinken; Markus Baumgartner; Andreas Fischer; Horst Bunke
Title Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting Type Conference Article
Year 2012 Publication 13th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal
Volume Issue Pages 49-54
Keywords
Abstract (up) State-of-the-art handwriting recognition systems are learning-based systems that require large sets of training data. The creation of training data, and consequently the creation of a well-performing recognition system, requires therefore a substantial amount of human work. This can be reduced with semi-supervised learning, which uses unlabeled text lines for training as well. Current approaches estimate the correct transcription of the unlabeled data via handwriting recognition which is not only extremely demanding as far as computational costs are concerned but also requires a good model of the target language. In this paper, we propose a different approach that makes use of keyword spotting, which is significantly faster and does not need any language model. In a set of experiments we demonstrate its superiority over existing approaches.
Address Bari, Italy
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 10.1109/ICFHR.2012.268 ISBN 978-1-4673-2262-1 Medium
Area Expedition Conference ICFHR
Notes DAG Approved no
Call Number Admin @ si @ FBF2012 Serial 2055
Permanent link to this record
 

 
Author Fahad Shahbaz Khan; Muhammad Anwer Rao; Joost Van de Weijer; Andrew Bagdanov; Maria Vanrell; Antonio Lopez
Title Color Attributes for Object Detection Type Conference Article
Year 2012 Publication 25th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 3306-3313
Keywords pedestrian detection
Abstract (up) State-of-the-art object detectors typically use shape information as a low level feature representation to capture the local structure of an object. This paper shows that early fusion of shape and color, as is popular in image classification,
leads to a significant drop in performance for object detection. Moreover, such approaches also yields suboptimal results for object categories with varying importance of color and shape.
In this paper we propose the use of color attributes as an explicit color representation for object detection. Color attributes are compact, computationally efficient, and when combined with traditional shape features provide state-ofthe-
art results for object detection. Our method is tested on the PASCAL VOC 2007 and 2009 datasets and results clearly show that our method improves over state-of-the-art techniques despite its simplicity. We also introduce a new dataset consisting of cartoon character images in which color plays a pivotal role. On this dataset, our approach yields a significant gain of 14% in mean AP over conventional state-of-the-art methods.
Address Providence; Rhode Island; USA;
Corporate Author Thesis
Publisher IEEE Xplore Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1063-6919 ISBN 978-1-4673-1226-4 Medium
Area Expedition Conference CVPR
Notes ADAS; CIC; Approved no
Call Number Admin @ si @ KRW2012 Serial 1935
Permanent link to this record
 

 
Author Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez
Title Determining the Best Suited Semantic Events for Cognitive Surveillance Type Journal Article
Year 2011 Publication Expert Systems with Applications Abbreviated Journal EXSY
Volume 38 Issue 4 Pages 4068–4079
Keywords Cognitive surveillance; Event modeling; Content-based video retrieval; Ontologies; Advanced user interfaces
Abstract (up) State-of-the-art systems on cognitive surveillance identify and describe complex events in selected domains, thus providing end-users with tools to easily access the contents of massive video footage. Nevertheless, as the complexity of events increases in semantics and the types of indoor/outdoor scenarios diversify, it becomes difficult to assess which events describe better the scene, and how to model them at a pixel level to fulfill natural language requests. We present an ontology-based methodology that guides the identification, step-by-step modeling, and generalization of the most relevant events to a specific domain. Our approach considers three steps: (1) end-users provide textual evidence from surveilled video sequences; (2) transcriptions are analyzed top-down to build the knowledge bases for event description; and (3) the obtained models are used to generalize event detection to different image sequences from the surveillance domain. This framework produces user-oriented knowledge that improves on existing advanced interfaces for video indexing and retrieval, by determining the best suited events for video understanding according to end-users. We have conducted experiments with outdoor and indoor scenes showing thefts, chases, and vandalism, demonstrating the feasibility and generalization of this proposal.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ FBR2011a Serial 1722
Permanent link to this record
 

 
Author Fahad Shahbaz Khan; Joost Van de Weijer; Sadiq Ali; Michael Felsberg
Title Evaluating the impact of color on texture recognition Type Conference Article
Year 2013 Publication 15th International Conference on Computer Analysis of Images and Patterns Abbreviated Journal
Volume 8047 Issue Pages 154-162
Keywords Color; Texture; image representation
Abstract (up) State-of-the-art texture descriptors typically operate on grey scale images while ignoring color information. A common way to obtain a joint color-texture representation is to combine the two visual cues at the pixel level. However, such an approach provides sub-optimal results for texture categorisation task.
In this paper we investigate how to optimally exploit color information for texture recognition. We evaluate a variety of color descriptors, popular in image classification, for texture categorisation. In addition we analyze different fusion approaches to combine color and texture cues. Experiments are conducted on the challenging scenes and 10 class texture datasets. Our experiments clearly suggest that in all cases color names provide the best performance. Late fusion is the best strategy to combine color and texture. By selecting the best color descriptor with optimal fusion strategy provides a gain of 5% to 8% compared to texture alone on scenes and texture datasets.
Address York; UK; August 2013
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-40260-9 Medium
Area Expedition Conference CAIP
Notes CIC; 600.048 Approved no
Call Number Admin @ si @ KWA2013 Serial 2263
Permanent link to this record
 

 
Author German Barquero; Sergio Escalera; Cristina Palmero
Title BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction Type Conference Article
Year 2023 Publication IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Abbreviated Journal
Volume Issue Pages 2317-2327
Keywords
Abstract (up) Stochastic human motion prediction (HMP) has generally been tackled with generative adversarial networks and variational autoencoders. Most prior works aim at predicting highly diverse movements in terms of the skeleton joints’ dispersion. This has led to methods predicting fast and motion-divergent movements, which are often unrealistic and incoherent with past motion. Such methods also neglect contexts that need to anticipate diverse low-range behaviors, or actions, with subtle joint displacements. To address these issues, we present BeLFusion, a model that, for the first time, leverages latent diffusion models in HMP to sample from a latent space where behavior is disentangled from pose and motion. As a result, diversity is encouraged from a behavioral perspective. Thanks to our behavior
coupler’s ability to transfer sampled behavior to ongoing motion, BeLFusion’s predictions display a variety of behaviors that are significantly more realistic than the state of the art. To support it, we introduce two metrics, the Area of
the Cumulative Motion Distribution, and the Average Pairwise Distance Error, which are correlated to our definition of realism according to a qualitative study with 126 participants. Finally, we prove BeLFusion’s generalization power in a new cross-dataset scenario for stochastic HMP.
Address 2-6 October 2023. Paris (France)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV
Notes HUPBA; no menciona Approved no
Call Number Admin @ si @ BEP2023 Serial 3829
Permanent link to this record
 

 
Author Juan Jose Rubio; Takahiro Kashiwa; Teera Laiteerapong; Wenlong Deng; Kohei Nagai; Sergio Escalera; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title Multi-class structural damage segmentation using fully convolutional networks Type Journal Article
Year 2019 Publication Computers in Industry Abbreviated Journal COMPUTIND
Volume 112 Issue Pages 103121
Keywords Bridge damage detection; Deep learning; Semantic segmentation
Abstract (up) Structural Health Monitoring (SHM) has benefited from computer vision and more recently, Deep Learning approaches, to accurately estimate the state of deterioration of infrastructure. In our work, we test Fully Convolutional Networks (FCNs) with a dataset of deck areas of bridges for damage segmentation. We create a dataset for delamination and rebar exposure that has been collected from inspection records of bridges in Niigata Prefecture, Japan. The dataset consists of 734 images with three labels per image, which makes it the largest dataset of images of bridge deck damage. This data allows us to estimate the performance of our method based on regions of agreement, which emulates the uncertainty of in-field inspections. We demonstrate the practicality of FCNs to perform automated semantic segmentation of surface damages. Our model achieves a mean accuracy of 89.7% for delamination and 78.4% for rebar exposure, and a weighted F1 score of 81.9%.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ RKL2019 Serial 3315
Permanent link to this record
 

 
Author Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network Type Journal Article
Year 2020 Publication Automation in Construction Abbreviated Journal AC
Volume 110 Issue Pages 102973
Keywords Semantic image segmentation; Deep learning
Abstract (up) Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ DMK2020 Serial 3314
Permanent link to this record
 

 
Author Muhammad Muzzamil Luqman; Jean-Yves Ramel; Josep Llados; Thierry Brouard
Title Fuzzy Multilevel Graph Embedding Type Journal Article
Year 2013 Publication Pattern Recognition Abbreviated Journal PR
Volume 46 Issue 2 Pages 551-565
Keywords Pattern recognition; Graphics recognition; Graph clustering; Graph classification; Explicit graph embedding; Fuzzy logic
Abstract (up) Structural pattern recognition approaches offer the most expressive, convenient, powerful but computational expensive representations of underlying relational information. To benefit from mature, less expensive and efficient state-of-the-art machine learning models of statistical pattern recognition they must be mapped to a low-dimensional vector space. Our method of explicit graph embedding bridges the gap between structural and statistical pattern recognition. We extract the topological, structural and attribute information from a graph and encode numeric details by fuzzy histograms and symbolic details by crisp histograms. The histograms are concatenated to achieve a simple and straightforward embedding of graph into a low-dimensional numeric feature vector. Experimentation on standard public graph datasets shows that our method outperforms the state-of-the-art methods of graph embedding for richly attributed graphs.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.042; 600.045; 605.203 Approved no
Call Number Admin @ si @ LRL2013a Serial 2270
Permanent link to this record
 

 
Author Jaume Gibert; Ernest Valveny; Horst Bunke
Title Graph of Words Embedding for Molecular Structure-Activity Relationship Analysis Type Conference Article
Year 2010 Publication 15th Iberoamerican Congress on Pattern Recognition Abbreviated Journal
Volume 6419 Issue Pages 30–37
Keywords
Abstract (up) Structure-Activity relationship analysis aims at discovering chemical activity of molecular compounds based on their structure. In this article we make use of a particular graph representation of molecules and propose a new graph embedding procedure to solve the problem of structure-activity relationship analysis. The embedding is essentially an arrangement of a molecule in the form of a vector by considering frequencies of appearing atoms and frequencies of covalent bonds between them. Results on two benchmark databases show the effectiveness of the proposed technique in terms of recognition accuracy while avoiding high operational costs in the transformation.
Address Sao Paulo, Brazil
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-16686-0 Medium
Area Expedition Conference CIARP
Notes DAG Approved no
Call Number DAG @ dag @ GVB2010 Serial 1462
Permanent link to this record
 

 
Author Wenwen Yu; Chengquan Zhang; Haoyu Cao; Wei Hua; Bohan Li; Huang Chen; Mingyu Liu; Mingrui Chen; Jianfeng Kuang; Mengjun Cheng; Yuning Du; Shikun Feng; Xiaoguang Hu; Pengyuan Lyu; Kun Yao; Yuechen Yu; Yuliang Liu; Wanxiang Che; Errui Ding; Cheng-Lin Liu; Jiebo Luo; Shuicheng Yan; Min Zhang; Dimosthenis Karatzas; Xing Sun; Jingdong Wang; Xiang Bai
Title ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images Type Conference Article
Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 14188 Issue Pages 536–552
Keywords
Abstract (up) Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot/Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.
Address San Jose; CA; USA; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ YZC2023 Serial 3896
Permanent link to this record
 

 
Author Katerine Diaz; Francesc J. Ferri; W. Diaz
Title Incremental Generalized Discriminative Common Vectors for Image Classification Type Journal Article
Year 2015 Publication IEEE Transactions on Neural Networks and Learning Systems Abbreviated Journal TNNLS
Volume 26 Issue 8 Pages 1761 - 1775
Keywords
Abstract (up) Subspace-based methods have become popular due to their ability to appropriately represent complex data in such a way that both dimensionality is reduced and discriminativeness is enhanced. Several recent works have concentrated on the discriminative common vector (DCV) method and other closely related algorithms also based on the concept of null space. In this paper, we present a generalized incremental formulation of the DCV methods, which allows the update of a given model by considering the addition of new examples even from unseen classes. Having efficient incremental formulations of well-behaved batch algorithms allows us to conveniently adapt previously trained classifiers without the need of recomputing them from scratch. The proposed generalized incremental method has been empirically validated in different case studies from different application domains (faces, objects, and handwritten digits) considering several different scenarios in which new data are continuously added at different rates starting from an initial model.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 2162-237X ISBN Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ DFD2015 Serial 2547
Permanent link to this record