Home | [11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Saad Minhas; Aura Hernandez-Sabate; Shoaib Ehsan; Klaus McDonald Maier | ||||
Title | Effects of Non-Driving Related Tasks during Self-Driving mode | Type | Journal Article | ||
Year | 2022 | Publication | IEEE Transactions on Intelligent Transportation Systems | Abbreviated Journal | TITS |
Volume | 23 | Issue ![]() |
2 | Pages | 1391-1399 |
Keywords | |||||
Abstract | Perception reaction time and mental workload have proven to be crucial in manual driving. Moreover, in highly automated cars, where most of the research is focusing on Level 4 Autonomous driving, take-over performance is also a key factor when taking road safety into account. This study aims to investigate how the immersion in non-driving related tasks affects the take-over performance of drivers in given scenarios. The paper also highlights the use of virtual simulators to gather efficient data that can be crucial in easing the transition between manual and autonomous driving scenarios. The use of Computer Aided Simulations is of absolute importance in this day and age since the automotive industry is rapidly moving towards Autonomous technology. An experiment comprising of 40 subjects was performed to examine the reaction times of driver and the influence of other variables in the success of take-over performance in highly automated driving under different circumstances within a highway virtual environment. The results reflect the relationship between reaction times under different scenarios that the drivers might face under the circumstances stated above as well as the importance of variables such as velocity in the success on regaining car control after automated driving. The implications of the results acquired are important for understanding the criteria needed for designing Human Machine Interfaces specifically aimed towards automated driving conditions. Understanding the need to keep drivers in the loop during automation, whilst allowing drivers to safely engage in other non-driving related tasks is an important research area which can be aided by the proposed study. | ||||
Address | Feb. 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM; 600.139; 600.145 | Approved | no | ||
Call Number | Admin @ si @ MHE2022 | Serial | 3468 | ||
Permanent link to this record | |||||
Author | Fatemeh Noroozi; Ciprian Corneanu; Dorota Kamińska; Tomasz Sapiński; Sergio Escalera; Gholamreza Anbarjafari | ||||
Title | Survey on Emotional Body Gesture Recognition | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Affective Computing | Abbreviated Journal | TAC |
Volume | 12 | Issue ![]() |
2 | Pages | 505 - 523 |
Keywords | |||||
Abstract | Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as “body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce, there is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ NCK2021 | Serial | 3657 | ||
Permanent link to this record | |||||
Author | Kaustubh Kulkarni; Ciprian Corneanu; Ikechukwu Ofodile; Sergio Escalera; Xavier Baro; Sylwia Hyniewska; Juri Allik; Gholamreza Anbarjafari | ||||
Title | Automatic Recognition of Facial Displays of Unfelt Emotions | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Affective Computing | Abbreviated Journal | TAC |
Volume | 12 | Issue ![]() |
2 | Pages | 377 - 390 |
Keywords | |||||
Abstract | Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average, it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ KCO2021 | Serial | 3658 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Computer Vision in the Infrared Spectrum: Challenges and Approaches | Type | Book Whole | ||
Year | 2021 | Publication | Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | 10 | Issue ![]() |
2 | Pages | 1-138 |
Keywords | |||||
Abstract | Human visual perception is limited to the visual-optical spectrum. Machine vision is not. Cameras sensitive to the different infrared spectra can enhance the abilities of autonomous systems and visually perceive the environment in a holistic way. Relevant scene content can be made visible especially in situations, where sensors of other modalities face issues like a visual-optical camera that needs a source of illumination. As a consequence, not only human mistakes can be avoided by increasing the level of automation, but also machine-induced errors can be reduced that, for example, could make a self-driving car crash into a pedestrian under difficult illumination conditions. Furthermore, multi-spectral sensor systems with infrared imagery as one modality are a rich source of information and can provably increase the robustness of many autonomous systems. Applications that can benefit from utilizing infrared imagery range from robotics to automotive and from biometrics to surveillance. In this book, we provide a brief yet concise introduction to the current state-of-the-art of computer vision and machine learning in the infrared spectrum. Based on various popular computer vision tasks such as image enhancement, object detection, or object tracking, we first motivate each task starting from established literature in the visual-optical spectrum. Then, we discuss the differences between processing images and videos in the visual-optical spectrum and the various infrared spectra. An overview of the current literature is provided together with an outlook for each task. Furthermore, available and annotated public datasets and common evaluation methods and metrics are presented. In a separate chapter, popular applications that can greatly benefit from the use of infrared imagery as a data source are presented and discussed. Among them are automatic target recognition, video surveillance, or biometrics including face recognition. Finally, we conclude with recommendations for well-fitting sensor setups and data processing algorithms for certain computer vision tasks. We address this book to prospective researchers and engineers new to the field but also to anyone who wants to get introduced to the challenges and the approaches of computer vision using infrared images or videos. Readers will be able to start their work directly after reading the book supported by a highly comprehensive backlog of recent and relevant literature as well as related infrared datasets including existing evaluation frameworks. Together with consistently decreasing costs for infrared cameras, new fields of application appear and make computer vision in the infrared spectrum a great opportunity to face nowadays scientific and engineering challenges. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1636392431 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ TSH2021 | Serial | 3666 | ||
Permanent link to this record | |||||
Author | Alex Gomez-Villa; Adrian Martin; Javier Vazquez; Marcelo Bertalmio; Jesus Malo | ||||
Title | On the synthesis of visual illusions using deep generative models | Type | Journal Article | ||
Year | 2022 | Publication | Journal of Vision | Abbreviated Journal | JOV |
Volume | 22(8) | Issue ![]() |
2 | Pages | 1-18 |
Keywords | |||||
Abstract | Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.161; 611.007 | Approved | no | ||
Call Number | Admin @ si @ GMV2022 | Serial | 3682 | ||
Permanent link to this record | |||||
Author | David Berga; Xavier Otazu | ||||
Title | A neurodynamic model of saliency prediction in v1 | Type | Journal Article | ||
Year | 2022 | Publication | Neural Computation | Abbreviated Journal | NEURALCOMPUT |
Volume | 34 | Issue ![]() |
2 | Pages | 378-414 |
Keywords | |||||
Abstract | Lateral connections in the primary visual cortex (V1) have long been hypothesized to be responsible for several visual processing mechanisms such as brightness induction, chromatic induction, visual discomfort, and bottom-up visual attention (also named saliency). Many computational models have been developed to independently predict these and other visual processes, but no computational model has been able to reproduce all of them simultaneously. In this work, we show that a biologically plausible computational model of lateral interactions of V1 is able to simultaneously predict saliency and all the aforementioned visual processes. Our model's architecture (NSWAM) is based on Penacchio's neurodynamic model of lateral connections of V1. It is defined as a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation, and scale. We tested NSWAM saliency predictions using images from several eye tracking data sets. We show that the accuracy of predictions obtained by our architecture, using shuffled metrics, is similar to other state-of-the-art computational methods, particularly with synthetic images (CAT2000-Pattern and SID4VAM) that mainly contain low-level features. Moreover, we outperform other biologically inspired saliency models that are specifically designed to exclusively reproduce saliency. We show that our biologically plausible model of lateral connections can simultaneously explain different visual processes present in V1 (without applying any type of training or optimization and keeping the same parameterization for all the visual processes). This can be useful for the definition of a unified architecture of the primary visual cortex. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | NEUROBIT; 600.128; 600.120 | Approved | no | ||
Call Number | Admin @ si @ BeO2022 | Serial | 3696 | ||
Permanent link to this record | |||||
Author | Y. Mori; M.Misawa; Jorge Bernal; M. Bretthauer; S.Kudo; A. Rastogi; Gloria Fernandez Esparrach | ||||
Title | Artificial Intelligence for Disease Diagnosis-the Gold Standard Challenge | Type | Journal Article | ||
Year | 2022 | Publication | Gastrointestinal Endoscopy | Abbreviated Journal | |
Volume | 96 | Issue ![]() |
2 | Pages | 370-372 |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ MMB2022 | Serial | 3701 | ||
Permanent link to this record | |||||
Author | Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez | ||||
Title | Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models | Type | Journal Article | ||
Year | 2023 | Publication | Sensors – Special Issue on “Machine Learning for Autonomous Driving Perception and Prediction” | Abbreviated Journal | SENS |
Volume | 23 | Issue ![]() |
2 | Pages | 621 |
Keywords | Domain adaptation; semi-supervised learning; Semantic segmentation; Autonomous driving | ||||
Abstract | Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic
segmentation models. It consists of a self-training stage, which provides two domain-adapted models, and a model collaboration loop for the mutual improvement of these two models. These models are then used to provide the final semantic segmentation labels (pseudo-labels) for the real-world images. The overall procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for on-board semantic segmentation. Our procedure shows improvements ranging from ∼13 to ∼26 mIoU points over baselines, so establishing new state-of-the-art results. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; no proj | Approved | no | ||
Call Number | Admin @ si @ GVL2023 | Serial | 3705 | ||
Permanent link to this record | |||||
Author | Gemma Rotger; Francesc Moreno-Noguer; Felipe Lumbreras; Antonio Agudo | ||||
Title | Detailed 3D face reconstruction from a single RGB image | Type | Journal | ||
Year | 2019 | Publication | Journal of WSCG | Abbreviated Journal | JWSCG |
Volume | 27 | Issue ![]() |
2 | Pages | 103-112 |
Keywords | 3D Wrinkle Reconstruction; Face Analysis, Optimization. | ||||
Abstract | This paper introduces a method to obtain a detailed 3D reconstruction of facial skin from a single RGB image.
To this end, we propose the exclusive use of an input image without requiring any information about the observed material nor training data to model the wrinkle properties. They are detected and characterized directly from the image via a simple and effective parametric model, determining several features such as location, orientation, width, and height. With these ingredients, we propose to minimize a photometric error to retrieve the final detailed 3D map, which is initialized by current techniques based on deep learning. In contrast with other approaches, we only require estimating a depth parameter, making our approach fast and intuitive. Extensive experimental evaluation is presented in a wide variety of synthetic and real images, including different skin properties and facial expressions. In all cases, our method outperforms the current approaches regarding 3D reconstruction accuracy, providing striking results for both large and fine wrinkles. |
||||
Address | 2019/11 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; 600.086; 600.130; 600.122 | Approved | no | ||
Call Number | Admin @ si @ | Serial | 3708 | ||
Permanent link to this record | |||||
Author | Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the 37th AAAI Conference on Artificial Intelligence | Abbreviated Journal | |
Volume | 37 | Issue ![]() |
2 | Pages | |
Keywords | Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning | ||||
Abstract | In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | AAAI | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ SBM2023 | Serial | 3848 | ||
Permanent link to this record | |||||
Author | Mickael Coustaty; Alicia Fornes | ||||
Title | Document Analysis and Recognition – ICDAR 2023 Workshops | Type | Book Whole | ||
Year | 2023 | Publication | Document Analysis and Recognition – ICDAR 2023 Workshops | Abbreviated Journal | |
Volume | 14194 | Issue ![]() |
2 | Pages | |
Keywords | |||||
Abstract | |||||
Address | San Jose; USA; August 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ CoF2023 | Serial | 3852 | ||
Permanent link to this record | |||||
Author | M. Altillawi; S. Li; S.M. Prakhya; Z. Liu; Joan Serrat | ||||
Title | Implicit Learning of Scene Geometry From Poses for Global Localization | Type | Journal Article | ||
Year | 2024 | Publication | IEEE Robotics and Automation Letters | Abbreviated Journal | ROBOTAUTOMLET |
Volume | 9 | Issue ![]() |
2 | Pages | 955-962 |
Keywords | Localization; Localization and mapping; Deep learning for visual perception; Visual learning | ||||
Abstract | Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 2377-3766 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ADAS | Approved | no | ||
Call Number | Admin @ si @ | Serial | 3857 | ||
Permanent link to this record | |||||
Author | Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the 37th AAAI Conference on Artificial Intelligence | Abbreviated Journal | |
Volume | 37 | Issue ![]() |
2 | Pages | 1940-1948 |
Keywords | |||||
Abstract | Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model. | ||||
Address | Washington; USA; February 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | AAAI | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ NBM2023 | Serial | 3860 | ||
Permanent link to this record | |||||
Author | Wenjuan Gong; Yue Zhang; Wei Wang; Peng Cheng; Jordi Gonzalez | ||||
Title | Meta-MMFNet: Meta-learning-based Multi-model Fusion Network for Micro-expression Recognition | Type | Journal Article | ||
Year | 2023 | Publication | ACM Transactions on Multimedia Computing, Communications, and Applications | Abbreviated Journal | TMCCA |
Volume | 20 | Issue ![]() |
2 | Pages | 1–20 |
Keywords | |||||
Abstract | Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning-based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ GZW2023 | Serial | 3862 | ||
Permanent link to this record | |||||
Author | Marta Diez-Ferrer; Debora Gil; Elena Carreño; Susana Padrones; Samantha Aso | ||||
Title | Positive Airway Pressure-Enhanced CT to Improve Virtual Bronchoscopic Navigation | Type | Journal Article | ||
Year | 2017 | Publication | Journal of Thoracic Oncology | Abbreviated Journal | JTO |
Volume | 12 | Issue ![]() |
1S | Pages | S596-S597 |
Keywords | Thorax CT; diagnosis; Peripheral Pulmonary Nodule | ||||
Abstract | A main weakness of virtual bronchoscopic navigation (VBN) is unsuccessful segmentation of distal branches approaching peripheral pulmonary nodules (PPN). CT scan acquisition protocol is pivotal for segmentation covering the utmost periphery. We hypothesize that application of continuous positive airway pressure (CPAP) during CT acquisition could improve visualization and segmentation of peripheral bronchi. The purpose of the present pilot study is to compare quality of segmentations under 4 CT acquisition modes: inspiration (INSP), expiration (EXP) and both with CPAP (INSP-CPAP and EXP-CPAP). | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM; 600.096; 600.075; 600.145 | Approved | no | ||
Call Number | Admin @ si @ DGC2017a | Serial | 2883 | ||
Permanent link to this record |