Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	661–675 of 3413 records found matching your query (RSS):

Search & Display Options

Select All Deselect All

[31–40] << 41 42 43 44 45 46 47 48 49 50 >> [51–60]

List View

Citations

Details

	Records
	Author	Yagmur Gucluturk; Umut Guclu; Marc Perez; Hugo Jair Escalante; Xavier Baro; Isabelle Guyon; Carlos Andujar; Julio C. S. Jacques Junior; Meysam Madadi; Sergio Escalera
	Title	Visualizing Apparent Personality Analysis with Deep Residual Networks			Type	Conference Article
	Year	2017	Publication	Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV	Abbreviated Journal
	Volume		Issue		Pages	3101-3109
	Keywords
	Abstract	Automatic prediction of personality traits is a subjective task that has recently received much attention. Specifically, automatic apparent personality trait prediction from multimodal data has emerged as a hot topic within the filed of computer vision and, more particularly, the so called “looking at people” sub-field. Considering “apparent” personality traits as opposed to real ones considerably reduces the subjectivity of the task. The real world applications are encountered in a wide range of domains, including entertainment, health, human computer interaction, recruitment and security. Predictive models of personality traits are useful for individuals in many scenarios (e.g., preparing for job interviews, preparing for public speaking). However, these predictions in and of themselves might be deemed to be untrustworthy without human understandable supportive evidence. Through a series of experiments on a recently released benchmark dataset for automatic apparent personality trait prediction, this paper characterizes the audio and visual information that is used by a state-of-the-art model while making its predictions, so as to provide such supportive evidence by explaining predictions made. Additionally, the paper describes a new web application, which gives feedback on apparent personality traits of its users by combining model predictions with their explanations.
	Address	Venice; Italy; October 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; 6002.143			Approved	no
	Call Number	Admin @ si @ GGP2017			Serial	3067
Permanent link to this record



	Author	Maryam Asadi-Aghbolaghi; Hugo Bertiche; Vicent Roig; Shohreh Kasaei; Sergio Escalera
	Title	Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-temporal Handcrafted Features and Deep Strategies			Type	Conference Article
	Year	2017	Publication	Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address	Venice; Italy; October 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ ABR2017			Serial	3068
Permanent link to this record



	Author	Albert Clapes; Tinne Tuytelaars; Sergio Escalera
	Title	Darwintrees for action recognition			Type	Conference Article
	Year	2017	Publication	Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ CTE2017			Serial	3069
Permanent link to this record



	Author	Lichao Zhang; Martin Danelljan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan
	Title	Multi-Modal Fusion for End-to-End RGB-T Tracking			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2252-2261
	Keywords
	Abstract	We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.
	Address	Seul; Corea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP; 600.109; 600.141; 600.120			Approved	no
	Call Number	Admin @ si @ ZDG2019			Serial	3279
Permanent link to this record



	Author	Javad Zolfaghari Bengar; Abel Gonzalez-Garcia; Gabriel Villalonga; Bogdan Raducanu; Hamed H. Aghdam; Mikhail Mozerov; Antonio Lopez; Joost Van de Weijer
	Title	Temporal Coherence for Active Learning in Videos			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	914-923
	Keywords
	Abstract	Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets.
	Address	Seul; Corea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP; ADAS; 600.124; 602.200; 600.118; 600.120; 600.141			Approved	no
	Call Number	Admin @ si @ ZGV2019			Serial	3294
Permanent link to this record



	Author	Reza Azad; Maryam Asadi Aghbolaghi; Mahmood Fathy; Sergio Escalera
	Title	Bi-Directional ConvLSTM U-Net with Densley Connected Convolutions			Type	Conference Article
	Year	2019	Publication	Visual Recognition for Medical Images workshop	Abbreviated Journal
	Volume		Issue		Pages	406-415
	Keywords
	Abstract	In recent years, deep learning-based networks have achieved state-of-the-art performance in medical image segmentation. Among the existing networks, U-Net has been successfully applied on medical image segmentation. In this paper, we propose an extension of U-Net, Bi-directional ConvLSTM U-Net with Densely connected convolutions (BCDU-Net), for medical image segmentation, in which we take full advantages of U-Net, bi-directional ConvLSTM (BConvLSTM) and the mechanism of dense convolutions. Instead of a simple concatenation in the skip connection of U-Net, we employ BConvLSTM to combine the feature maps extracted from the corresponding encoding path and the previous decoding up-convolutional layer in a non-linear way. To strengthen feature propagation and encourage feature reuse, we use densely connected convolutions in the last convolutional layer of the encoding path. Finally, we can accelerate the convergence speed of the proposed network by employing batch normalization (BN). The proposed model is evaluated on three datasets of: retinal blood vessel segmentation, skin lesion segmentation, and lung nodule segmentation, achieving state-of-the-art performance.
	Address	Seul; Korea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ AAF2019			Serial	3324
Permanent link to this record



	Author	Mohammed Al Rawi; Ernest Valveny
	Title	Compact and Efficient Multitask Learning in Vision, Language and Speech			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2933-2942
	Keywords
	Abstract	Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.
	Address	Seul; Korea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ RaV2019			Serial	3365
Permanent link to this record



	Author	Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
	Title	Seeing and Hearing Egocentric Actions: How Much Can We Learn?			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	4470-4480
	Keywords
	Abstract	Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.
	Address	Seul; Korea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ CLR2019b			Serial	3385
Permanent link to this record



	Author	Reza Azad; Afshin Bozorgpour; Maryam Asadi-Aghbolaghi; Dorit Merhof; Sergio Escalera
	Title	Deep Frequency Re-Calibration U-Net for Medical Image Segmentation			Type	Conference Article
	Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	3274-3283
	Keywords
	Abstract	We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions that suffer from scalability, applicability and compatibility. By limiting our scope to garment animation only, we are able to propose a simple model that can animate any outfit, independently of its topology, vertex order or connectivity. Our proposed architecture maps outfits to animated 3D models into the standard format for 3D animation (blend weights and blend shapes matrices), automatically providing of compatibility with any graphics engine. We also propose a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.
	Address	VIRTUAL; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ ABA2021			Serial	3645
Permanent link to this record



	Author	Ajian Liu; Chenxu Zhao; Zitong Yu; Anyang Su; Xing Liu; Zijian Kong; Jun Wan; Sergio Escalera; Hugo Jair Escalante; Zhen Lei; Guodong Guo
	Title	3D High-Fidelity Mask Face Presentation Attack Detection Challenge			Type	Conference Article
	Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	814-823
	Keywords
	Abstract	The threat of 3D mask to face recognition systems is increasing serious, and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of total amount of 54,600 videos which are recorded from 75 subjects with 225 realistic masks under 7 new kinds of sensors. Based on this dataset and Protocol 3 which evaluates both the discrimination and generalization ability of the algorithm under the open set scenarios, we organized a 3D High-Fidelity Mask Face Presentation Attack Detection Challenge to boost the research of 3D mask based attack detection. It attracted more than 200 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-ran by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including the introduction of the dataset used, the definition of the protocol, the calculation of the evaluation criteria, and the summary and publication of the competition results. Finally, we focus on introducing and analyzing the top ranked algorithms, the conclusion summary, and the research ideas for mask attack detection provided by this competition.
	Address	Virtual; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ LZY2021			Serial	3646
Permanent link to this record



	Author	Claudia Greco; Carmela Buono; Pau Buch-Cardona; Gennaro Cordasco; Sergio Escalera; Anna Esposito; Anais Fernandez; Daria Kyslitska; Maria Stylianou Kornes; Cristina Palmero; Jofre Tenorio Laranga; Anna Torp Johansen; Maria Ines Torres
	Title	Emotional Features of Interactions With Empathic Agents			Type	Conference Article
	Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2168-2176
	Keywords
	Abstract	The current study is part of the EMPATHIC project, whose aim is to develop an Empathic Virtual Coach (VC) capable of promoting healthy and independent aging. To this end, the VC needs to be capable of perceiving the emotional states of users and adjusting its behaviour during the interactions according to what the users are experiencing in terms of emotions and comfort. Thus, the present work focuses on some sessions where elderly users of three different countries interact with a simulated system. Audio and video information extracted from these sessions were examined by external observers to assess participants' emotional experience with the EMPATHIC-VC in terms of categorical and dimensional assessment of emotions. Analyses were conducted on the emotional labels assigned by the external observers while participants were engaged in two different scenarios: a generic one, where the interaction was carried out with no intention to discuss a specific topic, and a nutrition one, aimed to accomplish a conversation on users' nutritional habits. Results of analyses performed on both audio and video data revealed that the EMPATHIC coach did not elicit negative feelings in the users. Indeed, users from all countries have shown relaxed and positive behavior when interacting with the simulated VC during both scenarios. Overall, the EMPATHIC-VC was capable to offer an enjoyable experience without eliciting negative feelings in the users. This supports the hypothesis that an Empathic Virtual Coach capable of considering users' expectations and emotional states could support elderly people in daily life activities and help them to remain independent.
	Address	VIRTUAL; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ GBB2021			Serial	3647
Permanent link to this record



	Author	David Curto; Albert Clapes; Javier Selva; Sorina Smeureanu; Julio C. S. Jacques Junior; David Gallardo-Pujol; Georgina Guilera; David Leiva; Thomas B. Moeslund; Sergio Escalera; Cristina Palmero
	Title	Dyadformer: A Multi-Modal Transformer for Long-Range Modeling of Dyadic Interactions			Type	Conference Article
	Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2177-2188
	Keywords
	Abstract	Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art self-reported personality inference results on individual subjects on the UDIVA v0.5 dataset.
	Address	Virtual; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ CCS2021			Serial	3648
Permanent link to this record



	Author	Neelu Madan; Arya Farkhondeh; Kamal Nasrollahi; Sergio Escalera; Thomas B. Moeslund
	Title	Temporal Cues From Socially Unacceptable Trajectories for Anomaly Detection			Type	Conference Article
	Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2150-2158
	Keywords
	Abstract	State-of-the-Art (SoTA) deep learning-based approaches to detect anomalies in surveillance videos utilize limited temporal information, including basic information from motion, e.g., optical flow computed between consecutive frames. In this paper, we compliment the SoTA methods by including long-range dependencies from trajectories for anomaly detection. To achieve that, we first created trajectories by running a tracker on two SoTA datasets, namely Avenue and Shanghai-Tech. We propose a prediction-based anomaly detection method using trajectories based on Social GANs, also called in this paper as temporal-based anomaly detection. Then, we hypothesize that late fusion of the result of this temporal-based anomaly detection system with spatial-based anomaly detection systems produces SoTA results. We verify this hypothesis on two spatial-based anomaly detection systems. We show that both cases produce results better than baseline spatial-based systems, indicating the usefulness of the temporal information coming from the trajectories for anomaly detection. We observe that the proposed approach depicts the maximum improvement in micro-level Area-Under-the-Curve (AUC) by 4.1% on CUHK Avenue and 3.4% on Shanghai-Tech over one of the baseline method. We also show a high performance on cross-data evaluation, where we learn the weights to combine spatial and temporal information on Shanghai-Tech and perform evaluation on CUHK Avenue and vice-versa.
	Address	Virtual; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ MFN2021			Serial	3649
Permanent link to this record



	Author	Javad Zolfaghari Bengar; Joost Van de Weijer; Bartlomiej Twardowski; Bogdan Raducanu
	Title	Reducing Label Effort: Self- Supervised Meets Active Learning			Type	Conference Article
	Year	2021	Publication	International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	1631-1639
	Keywords
	Abstract	Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets. The current work focuses on whether the two paradigms can benefit from each other. We studied object recognition datasets including CIFAR10, CIFAR100 and Tiny ImageNet with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high. The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.
	Address	October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP; OR			Approved	no
	Call Number	Admin @ si @ ZVT2021			Serial	3672
Permanent link to this record



	Author	Shun Yao; Fei Yang; Yongmei Cheng; Mikhail Mozerov
	Title	3D Shapes Local Geometry Codes Learning with SDF			Type	Conference Article
	Year	2021	Publication	International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2110-2117
	Keywords
	Abstract	A signed distance function (SDF) as the 3D shape description is one of the most effective approaches to represent 3D geometry for rendering and reconstruction. Our work is inspired by the state-of-the-art method DeepSDF [17] that learns and analyzes the 3D shape as the iso-surface of its shell and this method has shown promising results especially in the 3D shape reconstruction and compression domain. In this paper, we consider the degeneration problem of reconstruction coming from the capacity decrease of the DeepSDF model, which approximates the SDF with a neural network and a single latent code. We propose Local Geometry Code Learning (LGCL), a model that improves the original DeepSDF results by learning from a local shape geometry of the full 3D shape. We add an extra graph neural network to split the single transmittable latent code into a set of local latent codes distributed on the 3D shape. Mentioned latent codes are used to approximate the SDF in their local regions, which will alleviate the complexity of the approximation compared to the original DeepSDF. Furthermore, we introduce a new geometric loss function to facilitate the training of these local latent codes. Note that other local shape adjusting methods use the 3D voxel representation, which in turn is a problem highly difficult to solve or even is insolvable. In contrast, our architecture is based on graph processing implicitly and performs the learning regression process directly in the latent code space, thus make the proposed architecture more flexible and also simple for realization. Our experiments on 3D shape reconstruction demonstrate that our LGCL method can keep more details with a significantly smaller size of the SDF decoder and outperforms considerably the original DeepSDF method under the most important quantitative metrics.
	Address	VIRTUAL; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ YYC2021			Serial	3681
Permanent link to this record