Publicacions CVC -- Query Results

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

Details

Records
Author	Hannes Mueller; Andre Groeger; Jonathan Hersh; Andrea Matranga; Joan Serrat
Title	Monitoring war destruction from space using machine learning			Type	Journal Article
Year	2021	Publication	Proceedings of the National Academy of Sciences of the United States of America	Abbreviated Journal	PNAS
Volume	118	Issue	23	Pages	e2025400118
Keywords
Abstract	Existing data on building destruction in conflict zones rely on eyewitness reports or manual detection, which makes it generally scarce, incomplete, and potentially biased. This lack of reliable data imposes severe limitations for media reporting, humanitarian relief efforts, human-rights monitoring, reconstruction initiatives, and academic studies of violent conflict. This article introduces an automated method of measuring destruction in high-resolution satellite images using deep-learning techniques combined with label augmentation and spatial and temporal smoothing, which exploit the underlying spatial and temporal structure of destruction. As a proof of concept, we apply this method to the Syrian civil war and reconstruct the evolution of damage in major cities across the country. Our approach allows generating destruction data with unprecedented scope, resolution, and frequency—and makes use of the ever-higher frequency at which satellite imagery becomes available.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.118			Approved	no
Call Number	Admin @ si @ MGH2021			Serial	3584
Permanent link to this record



Author	Bhaskar Chakraborty; Jordi Gonzalez; Xavier Roca
Title	Large scale continuous visual event recognition using max-margin Hough transformation framework			Type	Journal Article
Year	2013	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	117	Issue	10	Pages	1356–1368
Keywords
Abstract	In this paper we propose a novel method for continuous visual event recognition (CVER) on a large scale video dataset using max-margin Hough transformation framework. Due to high scalability, diverse real environmental state and wide scene variability direct application of action recognition/detection methods such as spatio-temporal interest point (STIP)-local feature based technique, on the whole dataset is practically infeasible. To address this problem, we apply a motion region extraction technique which is based on motion segmentation and region clustering to identify possible candidate “event of interest” as a preprocessing step. On these candidate regions a STIP detector is applied and local motion features are computed. For activity representation we use generalized Hough transform framework where each feature point casts a weighted vote for possible activity class centre. A max-margin frame work is applied to learn the feature codebook weight. For activity detection, peaks in the Hough voting space are taken into account and initial event hypothesis is generated using the spatio-temporal information of the participating STIPs. For event recognition a verification Support Vector Machine is used. An extensive evaluation on benchmark large scale video surveillance dataset (VIRAT) and as well on a small scale benchmark dataset (MSR) shows that the proposed method is applicable on a wide range of continuous visual event recognition applications having extremely challenging conditions.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1077-3142	ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ CGR2013			Serial	2413
Permanent link to this record



Author	Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez
Title	Selective Spatio-Temporal Interest Points			Type	Journal Article
Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	116	Issue	3	Pages	396-410
Keywords
Abstract	Recent progress in the field of human action recognition points towards the use of Spatio-TemporalInterestPoints (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.
Address
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1077-3142	ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ CHM2012			Serial	1806
Permanent link to this record



Author	Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu
Title	Low-dimensional and Comprehensive Color Texture Description			Type	Journal Article
Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	116	Issue	I	Pages	54-67
Keywords
Abstract	Image retrieval can be dealt by combining standard descriptors, such as those of MPEG-7, which are deﬁned independently for each visual cue (e.g. SCD or CLD for Color, HTD for texture or EHD for edges). A common problem is to combine similarities coming from descriptors representing different concepts in different spaces. In this paper we propose a color texture description that bypasses this problem from its inherent deﬁnition. It is based on a low dimensional space with 6 perceptual axes. Texture is described in a 3D space derived from a direct implementation of the original Julesz’s Texton theory and color is described in a 3D perceptual space. This early fusion through the blob concept in these two bounded spaces avoids the problem and allows us to derive a sparse color-texture descriptor that achieves similar performance compared to MPEG-7 in image retrieval. Moreover, our descriptor presents comprehensive qualities since it can also be applied either in segmentation or browsing: (a) a dense image representation is deﬁned from the descriptor showing a reasonable performance in locating texture patterns included in complex images; and (b) a vocabulary of basic terms is derived to build an intermediate level descriptor in natural language improving browsing by bridging semantic gap
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1077-3142	ISBN		Medium
Area		Expedition		Conference
Notes	CAT;CIC			Approved	no
Call Number	Admin @ si @ ASV2012			Serial	1827
Permanent link to this record



Author	Jordi Gonzalez; Thomas B. Moeslund; Liang Wang
Title	Semantic Understanding of Human Behaviors in Image Sequences: From video-surveillance to video-hermeneutics			Type	Journal Article
Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	116	Issue	3	Pages	305–306
Keywords
Abstract	Purpose: Atheromatic plaque progression is affected, among others phenomena, by biomechanical, biochemical, and physiological factors. In this paper, the authors introduce a novel framework able to provide both morphological (vessel radius, plaque thickness, and type) and biomechanical (wall shear stress and Von Mises stress) indices of coronary arteries.Methods: First, the approach reconstructs the three-dimensional morphology of the vessel from intravascular ultrasound (IVUS) and Angiographic sequences, requiring minimal user interaction. Then, a computational pipeline allows to automatically assess fluid-dynamic and mechanical indices. Ten coronary arteries are analyzed illustrating the capabilities of the tool and confirming previous technical and clinical observations.Results: The relations between the arterial indices obtained by IVUS measurement and simulations have been quantitatively analyzed along the whole surface of the artery, extending the analysis of the coronary arteries shown in previous state of the art studies. Additionally, for the first time in the literature, the framework allows the computation of the membrane stresses using a simplified mechanical model of the arterial wall.Conclusions: Circumferentially (within a given frame), statistical analysis shows an inverse relation between the wall shear stress and the plaque thickness. At the global level (comparing a frame within the entire vessel), it is observed that heavy plaque accumulations are in general calcified and are located in the areas of the vessel having high wall shear stress. Finally, in their experiments the inverse proportionality between fluid and structural stresses is observed.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1077-3142	ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ GMW2012			Serial	2005
Permanent link to this record



Author	Miquel Ferrer; Dimosthenis Karatzas; Ernest Valveny; I. Bardaji; Horst Bunke
Title	A Generic Framework for Median Graph Computation based on a Recursive Embedding Approach			Type	Journal Article
Year	2011	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	115	Issue	7	Pages	919-928
Keywords	Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition
Abstract	The median graph has been shown to be a good choice to obtain a represen- tative of a set of graphs. However, its computation is a complex problem. Recently, graph embedding into vector spaces has been proposed to obtain approximations of the median graph. The problem with such an approach is how to go from a point in the vector space back to a graph in the graph space. The main contribution of this paper is the generalization of this previ- ous method, proposing a generic recursive procedure that permits to recover the graph corresponding to a point in the vector space, introducing only the amount of approximation inherent to the use of graph matching algorithms. In order to evaluate the proposed method, we compare it with the set me- dian and with the other state-of-the-art embedding-based methods for the median graph computation. The experiments are carried out using four dif- ferent databases (one semi-artificial and three containing real-world data). Results show that with the proposed approach we can obtain better medi- ans, in terms of the sum of distances to the training graphs, than with the previous existing methods.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	IAM @ iam @ FKV2011			Serial	1831
Permanent link to this record



Author	David Geronimo; Angel Sappa; Daniel Ponsa; Antonio Lopez
Title	2D-3D based on-board pedestrian detection system			Type	Journal Article
Year	2010	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	114	Issue	5	Pages	583–595
Keywords	Pedestrian detection; Advanced Driver Assistance Systems; Horizon line; Haar wavelets; Edge orientation histograms
Abstract	During the next decade, on-board pedestrian detection systems will play a key role in the challenge of increasing traffic safety. The main target of these systems, to detect pedestrians in urban scenarios, implies overcoming difficulties like processing outdoor scenes from a mobile platform and searching for aspect-changing objects in cluttered environments. This makes such systems combine techniques in the state-of-the-art Computer Vision. In this paper we present a three module system based on both 2D and 3D cues. The first module uses 3D information to estimate the road plane parameters and thus select a coherent set of regions of interest (ROIs) to be further analyzed. The second module uses Real AdaBoost and a combined set of Haar wavelets and edge orientation histograms to classify the incoming ROIs as pedestrian or non-pedestrian. The final module loops again with the 3D cue in order to verify the classified ROIs and with the 2D in order to refine the final results. According to the results, the integration of the proposed techniques gives rise to a promising system.
Address	Computer Vision and Image Understanding (Special Issue on Intelligent Vision Systems), Vol. 114(5):583-595
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1077-3142	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	ADAS @ adas @ GSP2010			Serial	1341
Permanent link to this record



Author	Fernando Vilariño; Debora Gil; Petia Radeva
Title	A Novel FLDA Formulation for Numerical Stability Analysis			Type	Book Chapter
Year	2004	Publication	Recent Advances in Artificial Intelligence Research and Development	Abbreviated Journal
Volume	113	Issue		Pages	77-84
Keywords	Supervised Learning; Linear Discriminant Analysis; Numerical Stability; Computer Vision
Abstract	Fisher Linear Discriminant Analysis (FLDA) is one of the most popular techniques used in classification applying dimensional reduction. The numerical scheme involves the inversion of the within-class scatter matrix, which makes FLDA potentially ill-conditioned when it becomes singular. In this paper we present a novel explicit formulation of FLDA in terms of the eccentricity ratio and eigenvector orientations of the within-class scatter matrix. An analysis of this function will characterize those situations where FLDA response is not reliable because of numerical instability. This can solve common situations of poor classification performance in computer vision.
Address
Corporate Author				Thesis
Publisher	IOS Press	Place of Publication		Editor	J. Vitrià, P. Radeva and I. Aguiló
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-58603-466-5	Medium
Area		Expedition		Conference
Notes	MV;IAM;MILAB;SIAI			Approved	no
Call Number	IAM @ iam @ VGR2004			Serial	1663
Permanent link to this record



Author	Thanh Nam Le; Muhammad Muzzamil Luqman; Anjan Dutta; Pierre Heroux; Christophe Rigaud; Clement Guerin; Pasquale Foggia; Jean Christophe Burie; Jean Marc Ogier; Josep Llados; Sebastien Adam
Title	Subgraph spotting in graph representations of comic book images			Type	Journal Article
Year	2018	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	112	Issue		Pages	118-124
Keywords	Attributed graph; Region adjacency graph; Graph matching; Graph isomorphism; Subgraph isomorphism; Subgraph spotting; Graph indexing; Graph retrieval; Query by example; Dataset and comic book images
Abstract	Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.097; 600.121			Approved	no
Call Number	Admin @ si @ LLD2018			Serial	3150
Permanent link to this record



Author	Juan Jose Rubio; Takahiro Kashiwa; Teera Laiteerapong; Wenlong Deng; Kohei Nagai; Sergio Escalera; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title	Multi-class structural damage segmentation using fully convolutional networks			Type	Journal Article
Year	2019	Publication	Computers in Industry	Abbreviated Journal	COMPUTIND
Volume	112	Issue		Pages	103121
Keywords	Bridge damage detection; Deep learning; Semantic segmentation
Abstract	Structural Health Monitoring (SHM) has benefited from computer vision and more recently, Deep Learning approaches, to accurately estimate the state of deterioration of infrastructure. In our work, we test Fully Convolutional Networks (FCNs) with a dataset of deck areas of bridges for damage segmentation. We create a dataset for delamination and rebar exposure that has been collected from inspection records of bridges in Niigata Prefecture, Japan. The dataset consists of 734 images with three labels per image, which makes it the largest dataset of images of bridge deck damage. This data allows us to estimate the performance of our method based on regions of agreement, which emulates the uncertainty of in-field inspections. We demonstrate the practicality of FCNs to perform automated semantic segmentation of surface damages. Our model achieves a mean accuracy of 89.7% for delamination and 78.4% for rebar exposure, and a weighted F1 score of 81.9%.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ RKL2019			Serial	3315
Permanent link to this record



Author	Lei Kang; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol
Title	Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture			Type	Journal Article
Year	2021	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	112	Issue		Pages	107790
Keywords
Abstract	Sequence-to-sequence models have recently become very popular for tackling handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system. Thus, the bias between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this work, we introduce Candidate Fusion, a novel way to integrate an external language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two improvements. On the one hand, the sequence-to-sequence recognizer has the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided by the language model. On the other hand, the external language model has the ability to adapt itself to the training corpus and even learn the most commonly errors produced from the recognizer. Finally, by conducting comprehensive experiments, the Candidate Fusion proves to outperform the state-of-the-art language models for handwritten word recognition tasks.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.140; 601.302; 601.312; 600.121			Approved	no
Call Number	Admin @ si @ KRV2021			Serial	3343
Permanent link to this record



Author	Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca
Title	Camera pose estimation in multi-view environments: From virtual scenarios to the real world			Type	Journal Article
Year	2021	Publication	Image and Vision Computing	Abbreviated Journal	IVC
Volume	110	Issue		Pages	104182
Keywords
Abstract	This paper presents a domain adaptation strategy to efficiently train network architectures for estimating the relative camera pose in multi-view scenarios. The network architectures are fed by a pair of simultaneously acquired images, hence in order to improve the accuracy of the solutions, and due to the lack of large datasets with pairs of overlapped images, a domain adaptation strategy is proposed. The domain adaptation strategy consists on transferring the knowledge learned from synthetic images to real-world scenarios. For this, the networks are firstly trained using pairs of synthetic images, which are captured at the same time by a pair of cameras in a virtual environment; and then, the learned weights of the networks are transferred to the real-world case, where the networks are retrained with a few real images. Different virtual 3D scenarios are generated to evaluate the relationship between the accuracy on the result and the similarity between virtual and real scenarios—similarity on both geometry of the objects contained in the scene as well as relative pose between camera and objects in the scene. Experimental results and comparisons are provided showing that the accuracy of all the evaluated networks for estimating the camera pose improves when the proposed domain adaptation strategy is used, highlighting the importance on the similarity between virtual-real scenarios.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU; 600.130; 600.122			Approved	no
Call Number	Admin @ si @ CSV2021			Serial	3577
Permanent link to this record



Author	Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title	Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network			Type	Journal Article
Year	2020	Publication	Automation in Construction	Abbreviated Journal	AC
Volume	110	Issue		Pages	102973
Keywords	Semantic image segmentation; Deep learning
Abstract	Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ DMK2020			Serial	3314
Permanent link to this record



Author	Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title	Real-time Lexicon-free Scene Text Retrieval			Type	Journal Article
Year	2021	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	110	Issue		Pages	107656
Keywords
Abstract	In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.129; 601.338			Approved	no
Call Number	Admin @ si @ MTD2021			Serial	3493
Permanent link to this record



Author	Meysam Madadi; Hugo Bertiche; Sergio Escalera
Title	SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery			Type	Journal Article
Year	2020	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	106	Issue		Pages	107472
Keywords	Deep learning; 3D Human pose; Body shape; SMPL; Denoising autoencoder; Volumetric stack hourglass
Abstract	In this paper we propose to embed SMPL within a deep-based model to accurately estimate 3D pose and shape from a still RGB image. We use CNN-based 3D joint predictions as an intermediate representation to regress SMPL pose and shape parameters. Later, 3D joints are reconstructed again in the SMPL output. This module can be seen as an autoencoder where the encoder is a deep neural network and the decoder is SMPL model. We refer to this as SMPL reverse (SMPLR). By implementing SMPLR as an encoder-decoder we avoid the need of complex constraints on pose and shape. Furthermore, given that in-the-wild datasets usually lack accurate 3D annotations, it is desirable to lift 2D joints to 3D without pairing 3D annotations with RGB images. Therefore, we also propose a denoising autoencoder (DAE) module between CNN and SMPLR, able to lift 2D joints to 3D and partially recover from structured error. We evaluate our method on SURREAL and Human3.6M datasets, showing improvement over SMPL-based state-of-the-art alternatives by about 4 and 12 mm, respectively.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no proj			Approved	no
Call Number	Admin @ si @ MBE2020			Serial	3439
Permanent link to this record