|
Cristhian A. Aguilera-Carrasco, Cristhian Aguilera, Cristobal A. Navarro, & Angel Sappa. (2020). Fast CNN Stereo Depth Estimation through Embedded GPU Devices. SENS - Sensors, 20(11), 3249.
Abstract: Current CNN-based stereo depth estimation models can barely run under real-time constraints on embedded graphic processing unit (GPU) devices. Moreover, state-of-the-art evaluations usually do not consider model optimization techniques, being that it is unknown what is the current potential on embedded GPU devices. In this work, we evaluate two state-of-the-art models on three different embedded GPU devices, with and without optimization methods, presenting performance results that illustrate the actual capabilities of embedded GPU devices for stereo depth estimation. More importantly, based on our evaluation, we propose the use of a U-Net like architecture for postprocessing the cost-volume, instead of a typical sequence of 3D convolutions, drastically augmenting the runtime speed of current models. In our experiments, we achieve real-time inference speed, in the range of 5–32 ms, for 1216 × 368 input stereo images on the Jetson TX2, Jetson Xavier, and Jetson Nano embedded devices.
Keywords: stereo matching; deep learning; embedded GPU
|
|
|
Xavier Soria, Gonzalo Pomboza-Junez, & Angel Sappa. (2022). LDC: Lightweight Dense CNN for Edge Detection. ACCESS - IEEE Access, 10, 68281–68290.
Abstract: This paper presents a Lightweight Dense Convolutional (LDC) neural network for edge detection. The proposed model is an adaptation of two state-of-the-art approaches, but it requires less than 4% of parameters in comparison with these approaches. The proposed architecture generates thin edge maps and reaches the highest score (i.e., ODS) when compared with lightweight models (models with less than 1 million parameters), and reaches a similar performance when compare with heavy architectures (models with about 35 million parameters). Both quantitative and qualitative results and comparisons with state-of-the-art models, using different edge detection datasets, are provided. The proposed LDC does not use pre-trained weights and requires straightforward hyper-parameter settings. The source code is released at https://github.com/xavysp/LDC
|
|
|
Joan Serrat, Felipe Lumbreras, & Idoia Ruiz. (2018). Learning to measure for preshipment garment sizing. MEASURE - Measurement, 130, 327–339.
Abstract: Clothing is still manually manufactured for the most part nowadays, resulting in discrepancies between nominal and real dimensions, and potentially ill-fitting garments. Hence, it is common in the apparel industry to manually perform measures at preshipment time. We present an automatic method to obtain such measures from a single image of a garment that speeds up this task. It is generic and extensible in the sense that it does not depend explicitly on the garment shape or type. Instead, it learns through a probabilistic graphical model to identify the different contour parts. Subsequently, a set of Lasso regressors, one per desired measure, can predict the actual values of the measures. We present results on a dataset of 130 images of jackets and 98 of pants, of varying sizes and styles, obtaining 1.17 and 1.22 cm of mean absolute error, respectively.
Keywords: Apparel; Computer vision; Structured prediction; Regression
|
|
|
Cristhian A. Aguilera-Carrasco, C. Aguilera, & Angel Sappa. (2018). Melamine Faced Panels Defect Classification beyond the Visible Spectrum. SENS - Sensors, 18(11), 1–10.
Abstract: In this work, we explore the use of images from different spectral bands to classify defects in melamine faced panels, which could appear through the production process. Through experimental evaluation, we evaluate the use of images from the visible (VS), near-infrared (NIR), and long wavelength infrared (LWIR), to classify the defects using a feature descriptor learning approach together with a support vector machine classifier. Two descriptors were evaluated, Extended Local Binary Patterns (E-LBP) and SURF using a Bag of Words (BoW) representation. The evaluation was carried on with an image set obtained during this work, which contained five different defect categories that currently occurs in the industry. Results show that using images from beyond the visual spectrum helps to improve classification performance in contrast with a single visible spectrum solution.
Keywords: industrial application; infrared; machine learning
|
|
|
Henry Velesaca, Gisel Bastidas-Guacho, Mohammad Rouhani, & Angel Sappa. (2024). Multimodal image registration techniques: a comprehensive survey. MTAP - Multimedia Tools and Applications, .
Abstract: This manuscript presents a review of state-of-the-art techniques proposed in the literature for multimodal image registration, addressing instances where images from different modalities need to be precisely aligned in the same reference system. This scenario arises when the images to be registered come from different modalities, among the visible and thermal spectral bands, 3D-RGB, or flash-no flash, or NIR-visible. The review spans different techniques from classical approaches to more modern ones based on deep learning, aiming to highlight the particularities required at each step in the registration pipeline when dealing with multimodal images. It is noteworthy that medical images are excluded from this review due to their specific characteristics, including the use of both active and passive sensors or the non-rigid nature of the body contained in the image.
|
|