toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera edit  doi
openurl 
  Title Combining Local and Global Learners in the Pairwise Multiclass Classification Type Journal Article
  Year 2015 Publication Pattern Analysis and Applications Abbreviated Journal PAA  
  Volume 18 Issue 4 Pages 845-860  
  Keywords Multiclass classification; Pairwise approach; One-versus-one  
  Abstract Pairwise classification is a well-known class binarization technique that converts a multiclass problem into a number of two-class problems, one problem for each pair of classes. However, in the pairwise technique, nuisance votes of many irrelevant classifiers may result in a wrong class prediction. To overcome this problem, a simple, but efficient method is proposed and evaluated in this paper. The proposed method is based on excluding some classes and focusing on the most probable classes in the neighborhood space, named Local Crossing Off (LCO). This procedure is performed by employing a modified version of standard K-nearest neighbor and large margin nearest neighbor algorithms. The LCO method takes advantage of nearest neighbor classification algorithm because of its local learning behavior as well as the global behavior of powerful binary classifiers to discriminate between two classes. Combining these two properties in the proposed LCO technique will avoid the weaknesses of each method and will increase the efficiency of the whole classification system. On several benchmark datasets of varying size and difficulty, we found that the LCO approach leads to significant improvements using different base learners. The experimental results show that the proposed technique not only achieves better classification accuracy in comparison to other standard approaches, but also is computationally more efficient for tackling classification problems which have a relatively large number of target classes.  
  Address  
  Corporate Author Thesis  
  Publisher Springer London Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-7541 ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA;MILAB Approved no  
  Call Number (up) Admin @ si @ BGE2014 Serial 2441  
Permanent link to this record
 

 
Author Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera; Huamin Ren; Thomas B. Moeslund; Elham Etemad edit  url
openurl 
  Title Locality Regularized Group Sparse Coding for Action Recognition Type Journal Article
  Year 2017 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU  
  Volume 158 Issue Pages 106-114  
  Keywords Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition  
  Abstract Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no proj Approved no  
  Call Number (up) Admin @ si @ BGE2017 Serial 3014  
Permanent link to this record
 

 
Author Sonia Baeza; Debora Gil; I.Garcia Olive; M.Salcedo; J.Deportos; Carles Sanchez; Guillermo Torres; G.Moragas; Antoni Rosell edit  doi
openurl 
  Title A novel intelligent radiomic analysis of perfusion SPECT/CT images to optimize pulmonary embolism diagnosis in COVID-19 patients Type Journal Article
  Year 2022 Publication EJNMMI Physics Abbreviated Journal EJNMMI-PHYS  
  Volume 9 Issue 1, Article 84 Pages 1-17  
  Keywords  
  Abstract Background: COVID-19 infection, especially in cases with pneumonia, is associated with a high rate of pulmonary embolism (PE). In patients with contraindications for CT pulmonary angiography (CTPA) or non-diagnostic CTPA, perfusion single-photon emission computed tomography/computed tomography (Q-SPECT/CT) is a diagnostic alternative. The goal of this study is to develop a radiomic diagnostic system to detect PE based only on the analysis of Q-SPECT/CT scans.
Methods: This radiomic diagnostic system is based on a local analysis of Q-SPECT/CT volumes that includes both CT and Q-SPECT values for each volume point. We present a combined approach that uses radiomic features extracted from each scan as input into a fully connected classifcation neural network that optimizes a weighted crossentropy loss trained to discriminate between three diferent types of image patterns (pixel sample level): healthy lungs (control group), PE and pneumonia. Four types of models using diferent confguration of parameters were tested.
Results: The proposed radiomic diagnostic system was trained on 20 patients (4,927 sets of samples of three types of image patterns) and validated in a group of 39 patients (4,410 sets of samples of three types of image patterns). In the training group, COVID-19 infection corresponded to 45% of the cases and 51.28% in the test group. In the test group, the best model for determining diferent types of image patterns with PE presented a sensitivity, specifcity, positive predictive value and negative predictive value of 75.1%, 98.2%, 88.9% and 95.4%, respectively. The best model for detecting
pneumonia presented a sensitivity, specifcity, positive predictive value and negative predictive value of 94.1%, 93.6%, 85.2% and 97.6%, respectively. The area under the curve (AUC) was 0.92 for PE and 0.91 for pneumonia. When the results obtained at the pixel sample level are aggregated into regions of interest, the sensitivity of the PE increases to 85%, and all metrics improve for pneumonia.
Conclusion: This radiomic diagnostic system was able to identify the diferent lung imaging patterns and is a frst step toward a comprehensive intelligent radiomic system to optimize the diagnosis of PE by Q-SPECT/CT.
 
  Address 5 dec 2022  
  Corporate Author Thesis  
  Publisher Springer Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM Approved no  
  Call Number (up) Admin @ si @ BGG2022 Serial 3759  
Permanent link to this record
 

 
Author Dena Bazazian; Raul Gomez; Anguelos Nicolaou; Lluis Gomez; Dimosthenis Karatzas; Andrew Bagdanov edit   pdf
url  openurl
  Title Fast: Facilitated and accurate scene text proposals through fcn guided pruning Type Journal Article
  Year 2019 Publication Pattern Recognition Letters Abbreviated Journal PRL  
  Volume 119 Issue Pages 112-120  
  Keywords  
  Abstract Class-specific text proposal algorithms can efficiently reduce the search space for possible text object locations in an image. In this paper we combine the Text Proposals algorithm with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same recall level and thus gaining a significant speed up. Our experiments demonstrate that such text proposal approaches yield significantly higher recall rates than state-of-the-art text localization techniques, while also producing better-quality localizations. Our results on the ICDAR 2015 Robust Reading Competition (Challenge 4) and the COCO-text datasets show that, when combined with strong word classifiers, this recall margin leads to state-of-the-art results in end-to-end scene text recognition.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.084; 600.121; 600.129 Approved no  
  Call Number (up) Admin @ si @ BGN2019 Serial 3342  
Permanent link to this record
 

 
Author Xavier Boix; Josep M. Gonfaus; Joost Van de Weijer; Andrew Bagdanov; Joan Serrat; Jordi Gonzalez edit   pdf
url  doi
openurl 
  Title Harmony Potentials: Fusing Global and Local Scale for Semantic Image Segmentation Type Journal Article
  Year 2012 Publication International Journal of Computer Vision Abbreviated Journal IJCV  
  Volume 96 Issue 1 Pages 83-102  
  Keywords  
  Abstract The Hierarchical Conditional Random Field(HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales.
At higher scales in the image, this representation yields an oversimpli ed model since multiple classes can be reasonably expected to appear within large regions. This simpli ed model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To
address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combi-
nation of labels, penalizing only unlikely combinations of classes. We also propose an e ective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0920-5691 ISBN Medium  
  Area Expedition Conference  
  Notes ISE;CIC;ADAS Approved no  
  Call Number (up) Admin @ si @ BGW2012 Serial 1718  
Permanent link to this record
 

 
Author Miguel Angel Bautista; Antonio Hernandez; Sergio Escalera; Laura Igual; Oriol Pujol; Josep Moya; Veronica Violant; Maria Teresa Anguera edit   pdf
doi  openurl
  Title A Gesture Recognition System for Detecting Behavioral Patterns of ADHD Type Journal Article
  Year 2016 Publication IEEE Transactions on System, Man and Cybernetics, Part B Abbreviated Journal TSMCB  
  Volume 46 Issue 1 Pages 136-147  
  Keywords Gesture Recognition; ADHD; Gaussian Mixture Models; Convex Hulls; Dynamic Time Warping; Multi-modal RGB-Depth data  
  Abstract We present an application of gesture recognition using an extension of Dynamic Time Warping (DTW) to recognize behavioural patterns of Attention Deficit Hyperactivity Disorder (ADHD). We propose an extension of DTW using one-class classifiers in order to be able to encode the variability of a gesture category, and thus, perform an alignment between a gesture sample and a gesture class. We model the set of gesture samples of a certain gesture category using either GMMs or an approximation of Convex Hulls. Thus, we add a theoretical contribution to classical warping path in DTW by including local modeling of intra-class gesture variability. This methodology is applied in a clinical context, detecting a group of ADHD behavioural patterns defined by experts in psychology/psychiatry, to provide support to clinicians in the diagnose procedure. The proposed methodology is tested on a novel multi-modal dataset (RGB plus Depth) of ADHD children recordings with behavioural patterns. We obtain satisfying results when compared to standard state-of-the-art approaches in the DTW context.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; MILAB; Approved no  
  Call Number (up) Admin @ si @ BHE2016 Serial 2566  
Permanent link to this record
 

 
Author Jorge Bernal; Aymeric Histace; Marc Masana; Quentin Angermann; Cristina Sanchez Montes; Cristina Rodriguez de Miguel; Maroua Hammami; Ana Garcia Rodriguez; Henry Cordova; Olivier Romain; Gloria Fernandez Esparrach; Xavier Dray; F. Javier Sanchez edit   pdf
doi  openurl
  Title GTCreator: a flexible annotation tool for image-based datasets Type Journal Article
  Year 2019 Publication International Journal of Computer Assisted Radiology and Surgery Abbreviated Journal IJCAR  
  Volume 14 Issue 2 Pages 191–201  
  Keywords Annotation tool; Validation Framework; Benchmark; Colonoscopy; Evaluation  
  Abstract Abstract Purpose: Methodology evaluation for decision support systems for health is a time consuming-task. To assess performance of polyp detection
methods in colonoscopy videos, clinicians have to deal with the annotation
of thousands of images. Current existing tools could be improved in terms of
exibility and ease of use. Methods:We introduce GTCreator, a exible annotation tool for providing image and text annotations to image-based datasets.
It keeps the main basic functionalities of other similar tools while extending
other capabilities such as allowing multiple annotators to work simultaneously
on the same task or enhanced dataset browsing and easy annotation transfer aiming to speed up annotation processes in large datasets. Results: The
comparison with other similar tools shows that GTCreator allows to obtain
fast and precise annotation of image datasets, being the only one which offers
full annotation editing and browsing capabilites. Conclusions: Our proposed
annotation tool has been proven to be efficient for large image dataset annota-
tion, as well as showing potential of use in other stages of method evaluation
such as experimental setup or results analysis.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MV; 600.096; 600.109; 600.119; 601.305 Approved no  
  Call Number (up) Admin @ si @ BHM2019 Serial 3163  
Permanent link to this record
 

 
Author Sumit K. Banchhor; Narendra D. Londhe; Tadashi Araki; Luca Saba; Petia Radeva; Narendra N. Khanna; Jasjit S. Suri edit  url
openurl 
  Title Calcium detection, its quantification, and grayscale morphology-based risk stratification using machine learning in multimodality big data coronary and carotid scans: A review. Type Journal Article
  Year 2018 Publication Computers in Biology and Medicine Abbreviated Journal CBM  
  Volume 101 Issue Pages 184-198  
  Keywords Heart disease; Stroke; Atherosclerosis; Intravascular; Coronary; Carotid; Calcium; Morphology; Risk stratification  
  Abstract Purpose of review

Atherosclerosis is the leading cause of cardiovascular disease (CVD) and stroke. Typically, atherosclerotic calcium is found during the mature stage of the atherosclerosis disease. It is therefore often a challenge to identify and quantify the calcium. This is due to the presence of multiple components of plaque buildup in the arterial walls. The American College of Cardiology/American Heart Association guidelines point to the importance of calcium in the coronary and carotid arteries and further recommend its quantification for the prevention of heart disease. It is therefore essential to stratify the CVD risk of the patient into low- and high-risk bins.
Recent finding

Calcium formation in the artery walls is multifocal in nature with sizes at the micrometer level. Thus, its detection requires high-resolution imaging. Clinical experience has shown that even though optical coherence tomography offers better resolution, intravascular ultrasound still remains an important imaging modality for coronary wall imaging. For a computer-based analysis system to be complete, it must be scientifically and clinically validated. This study presents a state-of-the-art review (condensation of 152 publications after examining 200 articles) covering the methods for calcium detection and its quantification for coronary and carotid arteries, the pros and cons of these methods, and the risk stratification strategies. The review also presents different kinds of statistical models and gold standard solutions for the evaluation of software systems useful for calcium detection and quantification. Finally, the review concludes with a possible vision for designing the next-generation system for better clinical outcomes.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number (up) Admin @ si @ BLA2018 Serial 3188  
Permanent link to this record
 

 
Author Fernando Barrera; Felipe Lumbreras; Angel Sappa edit   pdf
doi  openurl
  Title Multimodal Stereo Vision System: 3D Data Extraction and Algorithm Evaluation Type Journal Article
  Year 2012 Publication IEEE Journal of Selected Topics in Signal Processing Abbreviated Journal J-STSP  
  Volume 6 Issue 5 Pages 437-446  
  Keywords  
  Abstract This paper proposes an imaging system for computing sparse depth maps from multispectral images. A special stereo head consisting of an infrared and a color camera defines the proposed multimodal acquisition system. The cameras are rigidly attached so that their image planes are parallel. Details about the calibration and image rectification procedure are provided. Sparse disparity maps are obtained by the combined use of mutual information enriched with gradient information. The proposed approach is evaluated using a Receiver Operating Characteristics curve. Furthermore, a multispectral dataset, color and infrared images, together with their corresponding ground truth disparity maps, is generated and used as a test bed. Experimental results in real outdoor scenarios are provided showing its viability and that the proposed approach is not restricted to a specific domain.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1932-4553 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number (up) Admin @ si @ BLS2012b Serial 2155  
Permanent link to this record
 

 
Author Fernando Barrera; Felipe Lumbreras; Angel Sappa edit  url
doi  openurl
  Title Multispectral Piecewise Planar Stereo using Manhattan-World Assumption Type Journal Article
  Year 2013 Publication Pattern Recognition Letters Abbreviated Journal PRL  
  Volume 34 Issue 1 Pages 52-61  
  Keywords Multispectral stereo rig; Dense disparity maps from multispectral stereo; Color and infrared images  
  Abstract This paper proposes a new framework for extracting dense disparity maps from a multispectral stereo rig. The system is constructed with an infrared and a color camera. It is intended to explore novel multispectral stereo matching approaches that will allow further extraction of semantic information. The proposed framework consists of three stages. Firstly, an initial sparse disparity map is generated by using a cost function based on feature matching in a multiresolution scheme. Then, by looking at the color image, a set of planar hypotheses is defined to describe the surfaces on the scene. Finally, the previous stages are combined by reformulating the disparity computation as a global minimization problem. The paper has two main contributions. The first contribution combines mutual information with a shape descriptor based on gradient in a multiresolution scheme. The second contribution, which is based on the Manhattan-world assumption, extracts a dense disparity representation using the graph cut algorithm. Experimental results in outdoor scenarios are provided showing the validity of the proposed framework.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.054; 600.055; 605.203 Approved no  
  Call Number (up) Admin @ si @ BLS2013 Serial 2245  
Permanent link to this record
 

 
Author Francisco Blanco; Felipe Lumbreras; Joan Serrat; Roswitha Siener; Silvia Serranti; Giuseppe Bonifazi; Montserrat Lopez Mesas; Manuel Valiente edit  doi
openurl 
  Title Taking advantage of Hyperspectral Imaging classification of urinary stones against conventional IR Spectroscopy Type Journal Article
  Year 2014 Publication Journal of Biomedical Optics Abbreviated Journal JBiO  
  Volume 19 Issue 12 Pages 126004-1 - 126004-9  
  Keywords  
  Abstract The analysis of urinary stones is mandatory for the best management of the disease after the stone passage in order to prevent further stone episodes. Thus the use of an appropriate methodology for an individualized stone analysis becomes a key factor for giving the patient the most suitable treatment. A recently developed hyperspectral imaging methodology, based on pixel-to-pixel analysis of near-infrared spectral images, is compared to the reference technique in stone analysis, infrared (IR) spectroscopy. The developed classification model yields >90% correct classification rate when compared to IR and is able to precisely locate stone components within the structure of the stone with a 15 µm resolution. Due to the little sample pretreatment, low analysis time, good performance of the model, and the automation of the measurements, they become analyst independent; this methodology can be considered to become a routine analysis for clinical laboratories.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.076 Approved no  
  Call Number (up) Admin @ si @ BLS2014 Serial 2563  
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades edit   pdf
doi  openurl
  Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 139 Issue Pages 109419  
  Keywords  
  Abstract Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number (up) Admin @ si @ BMC2023 Serial 3826  
Permanent link to this record
 

 
Author Hugo Bertiche; Meysam Madadi; Sergio Escalera edit   pdf
url  openurl
  Title PBNS: Physically Based Neural Simulation for Unsupervised Garment Pose Space Deformation Type Journal Article
  Year 2021 Publication ACM Transactions on Graphics Abbreviated Journal  
  Volume 40 Issue 6 Pages 1-14  
  Keywords  
  Abstract We present a methodology to automatically obtain Pose Space Deformation (PSD) basis for rigged garments through deep learning. Classical approaches rely on Physically Based Simulations (PBS) to animate clothes. These are general solutions that, given a sufficiently fine-grained discretization of space and time, can achieve highly realistic results. However, they are computationally expensive and any scene modification prompts the need of re-simulation. Linear Blend Skinning (LBS) with PSD offers a lightweight alternative to PBS, though, it needs huge volumes of data to learn proper PSD. We propose using deep learning, formulated as an implicit PBS, to unsupervisedly learn realistic cloth Pose Space Deformations in a constrained scenario: dressed humans. Furthermore, we show it is possible to train these models in an amount of time comparable to a PBS of a few sequences. To the best of our knowledge, we are the first to propose a neural simulator for cloth.
While deep-based approaches in the domain are becoming a trend, these are data-hungry models. Moreover, authors often propose complex formulations to better learn wrinkles from PBS data. Supervised learning leads to physically inconsistent predictions that require collision solving to be used. Also, dependency on PBS data limits the scalability of these solutions, while their formulation hinders its applicability and compatibility. By proposing an unsupervised methodology to learn PSD for LBS models (3D animation standard), we overcome both of these drawbacks. Results obtained show cloth-consistency in the animated garments and meaningful pose-dependant folds and wrinkles. Our solution is extremely efficient, handles multiple layers of cloth, allows unsupervised outfit resizing and can be easily applied to any custom 3D avatar.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number (up) Admin @ si @ BME2021c Serial 3643  
Permanent link to this record
 

 
Author Hugo Bertiche; Meysam Madadi; Sergio Escalera edit  doi
openurl 
  Title Neural Cloth Simulation Type Journal Article
  Year 2022 Publication ACM Transactions on Graphics Abbreviated Journal ACMTGraph  
  Volume 41 Issue 6 Pages 1-14  
  Keywords  
  Abstract We present a general framework for the garment animation problem through unsupervised deep learning inspired in physically based simulation. Existing trends in the literature already explore this possibility. Nonetheless, these approaches do not handle cloth dynamics. Here, we propose the first methodology able to learn realistic cloth dynamics unsupervisedly, and henceforth, a general formulation for neural cloth simulation. The key to achieve this is to adapt an existing optimization scheme for motion from simulation based methodologies to deep learning. Then, analyzing the nature of the problem, we devise an architecture able to automatically disentangle static and dynamic cloth subspaces by design. We will show how this improves model performance. Additionally, this opens the possibility of a novel motion augmentation technique that greatly improves generalization. Finally, we show it also allows to control the level of motion in the predictions. This is a useful, never seen before, tool for artists. We provide of detailed analysis of the problem to establish the bases of neural cloth simulation and guide future research into the specifics of this domain.



ACM Transactions on GraphicsVolume 41Issue 6December 2022 Article No.: 220pp 1–
 
  Address Dec 2022  
  Corporate Author Thesis  
  Publisher ACM Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number (up) Admin @ si @ BME2022b Serial 3779  
Permanent link to this record
 

 
Author Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund edit  doi
openurl 
  Title Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification Type Journal Article
  Year 2022 Publication Automation in Construction Abbreviated Journal AC  
  Volume 144 Issue Pages 104614  
  Keywords Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection  
  Abstract A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.  
  Address Dec 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA Approved no  
  Call Number (up) Admin @ si @ BME2022c Serial 3780  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: