|
Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, & Thomas B. Moeslund. (2022). Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification. AC - Automation in Construction, 144, 104614.
Abstract: A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Keywords: Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection
|
|
|
Sergio Escalera, Oriol Pujol, J. Mauri, & Petia Radeva. (2009). Intravascular Ultrasound Tissue Characterization with Sub-class Error-Correcting Output Codes. Journal of Signal Processing Systems, 55(1-3), 35–47.
Abstract: Intravascular ultrasound (IVUS) represents a powerful imaging technique to explore coronary vessels and to study their morphology and histologic properties. In this paper, we characterize different tissues based on radial frequency, texture-based, and combined features. To deal with the classification of multiple tissues, we require the use of robust multi-class learning techniques. In this sense, error-correcting output codes (ECOC) show to robustly combine binary classifiers to solve multi-class problems. In this context, we propose a strategy to model multi-class classification tasks using sub-classes information in the ECOC framework. The new strategy splits the classes into different sub-sets according to the applied base classifier. Complex IVUS data sets containing overlapping data are learnt by splitting the original set of classes into sub-classes, and embedding the binary problems in a problem-dependent ECOC design. The method automatically characterizes different tissues, showing performance improvements over the state-of-the-art ECOC techniques for different base classifiers. Furthermore, the combination of RF and texture-based features also shows improvements over the state-of-the-art approaches.
|
|
|
Meysam Madadi, Hugo Bertiche, & Sergio Escalera. (2021). Deep unsupervised 3D human body reconstruction from a sparse set of landmarks. IJCV - International Journal of Computer Vision, 129, 2499–2512.
Abstract: In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model that reconstructs body. Our set of proposed loss functions allows us to train the network in an unsupervised way. Results on four public datasets show that our approach accurately reconstructs the human body from real world mocap data.
|
|
|
Cristina Palmero, Jordi Esquirol, Vanessa Bayo, Miquel Angel Cos, Pouya Ahmadmonfared, Joan Salabert, et al. (2017). Automatic Sleep System Recommendation by Multi-modal RBG-Depth-Pressure Anthropometric Analysis. IJCV - International Journal of Computer Vision, 122(2), 212–227.
Abstract: This paper presents a novel system for automatic sleep system recommendation using RGB, depth and pressure information. It consists of a validated clinical knowledge-based model that, along with a set of prescription variables extracted automatically, obtains a personalized bed design recommendation. The automatic process starts by performing multi-part human body RGB-D segmentation combining GrabCut, 3D Shape Context descriptor and Thin Plate Splines, to then extract a set of anthropometric landmark points by applying orthogonal plates to the segmented human body. The extracted variables are introduced to the computerized clinical model to calculate body circumferences, weight, morphotype and Body Mass Index categorization. Furthermore, pressure image analysis is performed to extract pressure values and at-risk points, which are also introduced to the model to eventually obtain the final prescription of mattress, topper, and pillow. We validate the complete system in a set of 200 subjects, showing accurate category classification and high correlation results with respect to manual measures.
Keywords: Sleep system recommendation; RGB-Depth data Pressure imaging; Anthropometric landmark extraction; Multi-part human body segmentation
|
|
|
Cristina Palmero, Albert Clapes, Chris Bahnsen, Andreas Møgelmose, Thomas B. Moeslund, & Sergio Escalera. (2016). Multi-modal RGB-Depth-Thermal Human Body Segmentation. IJCV - International Journal of Computer Vision, 118(2), 217–239.
Abstract: This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations.
Keywords: Human body segmentation; RGB ; Depth Thermal
|
|