Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–20] |
Records | |||||
---|---|---|---|---|---|
Author | Idoia Ruiz; Joan Serrat | ||||
Title | Hierarchical Novelty Detection for Traffic Sign Recognition | Type | Journal Article | ||
Year | 2022 | Publication | Sensors | Abbreviated Journal | SENS |
Volume | 22 | Issue | 12 | Pages | 4389 |
Keywords | Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision | ||||
Abstract | Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.154 | Approved | no | ||
Call Number | Admin @ si @ RuS2022 | Serial | 3684 | ||
Permanent link to this record | |||||
Author | Yasuko Sugito; Javier Vazquez; Trevor Canham; Marcelo Bertalmio | ||||
Title | Image quality evaluation in professional HDR/WCG production questions the need for HDR metrics | Type | Journal Article | ||
Year | 2022 | Publication | IEEE Transactions on Image Processing | Abbreviated Journal | TIP |
Volume | 31 | Issue | Pages | 5163 - 5177 | |
Keywords | Measurement; Image color analysis; Image coding; Production; Dynamic range; Brightness; Extraterrestrial measurements | ||||
Abstract | In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | 600.161; 611.007 | Approved | no | ||
Call Number | Admin @ si @ SVG2022 | Serial | 3683 | ||
Permanent link to this record | |||||
Author | Alex Gomez-Villa; Adrian Martin; Javier Vazquez; Marcelo Bertalmio; Jesus Malo | ||||
Title | On the synthesis of visual illusions using deep generative models | Type | Journal Article | ||
Year | 2022 | Publication | Journal of Vision | Abbreviated Journal | JOV |
Volume | 22(8) | Issue | 2 | Pages | 1-18 |
Keywords | |||||
Abstract | Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.161; 611.007 | Approved | no | ||
Call Number | Admin @ si @ GMV2022 | Serial | 3682 | ||
Permanent link to this record | |||||
Author | AN Ruchai; VI Kober; KA Dorofeev; VN Karnaukhov; Mikhail Mozerov | ||||
Title | Classification of breast abnormalities using a deep convolutional neural network and transfer learning | Type | Journal Article | ||
Year | 2021 | Publication | Journal of Communications Technology and Electronics | Abbreviated Journal | |
Volume | 66 | Issue | 6 | Pages | 778–783 |
Keywords | |||||
Abstract | A new algorithm for classification of breast pathologies in digital mammography using a convolutional neural network and transfer learning is proposed. The following pretrained neural networks were chosen: MobileNetV2, InceptionResNetV2, Xception, and ResNetV2. All mammographic images were pre-processed to improve classification reliability. Transfer training was carried out using additional data augmentation and fine-tuning. The performance of the proposed algorithm for classification of breast pathologies in terms of accuracy on real data is discussed and compared with that of state-of-the-art algorithms on the available MIAS database. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; | Approved | no | ||
Call Number | Admin @ si @ RKD2022 | Serial | 3680 | ||
Permanent link to this record | |||||
Author | Fei Yang; Yaxing Wang; Luis Herranz; Yongmei Cheng; Mikhail Mozerov | ||||
Title | A Novel Framework for Image-to-image Translation and Image Compression | Type | Journal Article | ||
Year | 2022 | Publication | Neurocomputing | Abbreviated Journal | NEUCOM |
Volume | 508 | Issue | Pages | 58-70 | |
Keywords | |||||
Abstract | Data-driven paradigms using machine learning are becoming ubiquitous in image processing and communications. In particular, image-to-image (I2I) translation is a generic and widely used approach to image processing problems, such as image synthesis, style transfer, and image restoration. At the same time, neural image compression has emerged as a data-driven alternative to traditional coding approaches in visual communications. In this paper, we study the combination of these two paradigms into a joint I2I compression and translation framework, focusing on multi-domain image synthesis. We first propose distributed I2I translation by integrating quantization and entropy coding into an I2I translation framework (i.e. I2Icodec). In practice, the image compression functionality (i.e. autoencoding) is also desirable, requiring to deploy alongside I2Icodec a regular image codec. Thus, we further propose a unified framework that allows both translation and autoencoding capabilities in a single codec. Adaptive residual blocks conditioned on the translation/compression mode provide flexible adaptation to the desired functionality. The experiments show promising results in both I2I translation and image compression using a single model. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ YWH2022 | Serial | 3679 | ||
Permanent link to this record | |||||
Author | O.F.Ahmad; Y.Mori; M.Misawa; S.Kudo; J.T.Anderson; Jorge Bernal | ||||
Title | Establishing key research questions for the implementation of artificial intelligence in colonoscopy: a modified Delphi method | Type | Journal Article | ||
Year | 2021 | Publication | Endoscopy | Abbreviated Journal | END |
Volume | 53 | Issue | 9 | Pages | 893-901 |
Keywords | |||||
Abstract | BACKGROUND : Artificial intelligence (AI) research in colonoscopy is progressing rapidly but widespread clinical implementation is not yet a reality. We aimed to identify the top implementation research priorities. METHODS : An established modified Delphi approach for research priority setting was used. Fifteen international experts, including endoscopists and translational computer scientists/engineers, from nine countries participated in an online survey over 9 months. Questions related to AI implementation in colonoscopy were generated as a long-list in the first round, and then scored in two subsequent rounds to identify the top 10 research questions. RESULTS : The top 10 ranked questions were categorized into five themes. Theme 1: clinical trial design/end points (4 questions), related to optimum trial designs for polyp detection and characterization, determining the optimal end points for evaluation of AI, and demonstrating impact on interval cancer rates. Theme 2: technological developments (3 questions), including improving detection of more challenging and advanced lesions, reduction of false-positive rates, and minimizing latency. Theme 3: clinical adoption/integration (1 question), concerning the effective combination of detection and characterization into one workflow. Theme 4: data access/annotation (1 question), concerning more efficient or automated data annotation methods to reduce the burden on human experts. Theme 5: regulatory approval (1 question), related to making regulatory approval processes more efficient. CONCLUSIONS : This is the first reported international research priority setting exercise for AI in colonoscopy. The study findings should be used as a framework to guide future research with key stakeholders to accelerate the clinical implementation of AI in endoscopy. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ AMM2021 | Serial | 3670 | ||
Permanent link to this record | |||||
Author | F.Negin; Pau Rodriguez; M.Koperski; A.Kerboua; Jordi Gonzalez; J.Bourgeois; E.Chapoulie; P.Robert; F.Bremond | ||||
Title | PRAXIS: Towards automatic cognitive assessment using gesture recognition | Type | Journal Article | ||
Year | 2018 | Publication | Expert Systems with Applications | Abbreviated Journal | ESWA |
Volume | 106 | Issue | Pages | 21-35 | |
Keywords | |||||
Abstract | Praxis test is a gesture-based diagnostic test which has been accepted as diagnostically indicative of cortical pathologies such as Alzheimer’s disease. Despite being simple, this test is oftentimes skipped by the clinicians. In this paper, we propose a novel framework to investigate the potential of static and dynamic upper-body gestures based on the Praxis test and their potential in a medical framework to automatize the test procedures for computer-assisted cognitive assessment of older adults.
In order to carry out gesture recognition as well as correctness assessment of the performances we have recollected a novel challenging RGB-D gesture video dataset recorded by Kinect v2, which contains 29 specific gestures suggested by clinicians and recorded from both experts and patients performing the gesture set. Moreover, we propose a framework to learn the dynamics of upper-body gestures, considering the videos as sequences of short-term clips of gestures. Our approach first uses body part detection to extract image patches surrounding the hands and then, by means of a fine-tuned convolutional neural network (CNN) model, it learns deep hand features which are then linked to a long short-term memory to capture the temporal dependencies between video frames. We report the results of four developed methods using different modalities. The experiments show effectiveness of our deep learning based approach in gesture recognition and performance assessment tasks. Satisfaction of clinicians from the assessment reports indicates the impact of framework corresponding to the diagnosis. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ NRK2018 | Serial | 3669 | ||
Permanent link to this record | |||||
Author | Diana Ramirez Cifuentes; Ana Freire; Ricardo Baeza Yates; Nadia Sanz Lamora; Aida Alvarez; Alexandre Gonzalez; Meritxell Lozano; Roger Llobet; Diego Velazquez; Josep M. Gonfaus; Jordi Gonzalez | ||||
Title | Characterization of Anorexia Nervosa on Social Media: Textual, Visual, Relational, Behavioral, and Demographical Analysis | Type | Journal Article | ||
Year | 2021 | Publication | Journal of Medical Internet Research | Abbreviated Journal | JMIR |
Volume | 23 | Issue | 7 | Pages | e25925 |
Keywords | |||||
Abstract | Background: Eating disorders are psychological conditions characterized by unhealthy eating habits. Anorexia nervosa (AN) is defined as the belief of being overweight despite being dangerously underweight. The psychological signs involve emotional and behavioral issues. There is evidence that signs and symptoms can manifest on social media, wherein both harmful and beneficial content is shared daily. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ RFB2021 | Serial | 3665 | ||
Permanent link to this record | |||||
Author | Diego Velazquez; Josep M. Gonfaus; Pau Rodriguez; Xavier Roca; Seiichi Ozawa; Jordi Gonzalez | ||||
Title | Logo Detection With No Priors | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Access | Abbreviated Journal | ACCESS |
Volume | 9 | Issue | Pages | 106998-107011 | |
Keywords | |||||
Abstract | In recent years, top referred methods on object detection like R-CNN have implemented this task as a combination of proposal region generation and supervised classification on the proposed bounding boxes. Although this pipeline has achieved state-of-the-art results in multiple datasets, it has inherent limitations that make object detection a very complex and inefficient task in computational terms. Instead of considering this standard strategy, in this paper we enhance Detection Transformers (DETR) which tackles object detection as a set-prediction problem directly in an end-to-end fully differentiable pipeline without requiring priors. In particular, we incorporate Feature Pyramids (FP) to the DETR architecture and demonstrate the effectiveness of the resulting DETR-FP approach on improving logo detection results thanks to the improved detection of small logos. So, without requiring any domain specific prior to be fed to the model, DETR-FP obtains competitive results on the OpenLogo and MS-COCO datasets offering a relative improvement of up to 30%, when compared to a Faster R-CNN baseline which strongly depends on hand-designed priors. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ VGR2021 | Serial | 3664 | ||
Permanent link to this record | |||||
Author | Kaustubh Kulkarni; Ciprian Corneanu; Ikechukwu Ofodile; Sergio Escalera; Xavier Baro; Sylwia Hyniewska; Juri Allik; Gholamreza Anbarjafari | ||||
Title | Automatic Recognition of Facial Displays of Unfelt Emotions | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Affective Computing | Abbreviated Journal | TAC |
Volume | 12 | Issue | 2 | Pages | 377 - 390 |
Keywords | |||||
Abstract | Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average, it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ KCO2021 | Serial | 3658 | ||
Permanent link to this record | |||||
Author | Fatemeh Noroozi; Ciprian Corneanu; Dorota Kamińska; Tomasz Sapiński; Sergio Escalera; Gholamreza Anbarjafari | ||||
Title | Survey on Emotional Body Gesture Recognition | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Affective Computing | Abbreviated Journal | TAC |
Volume | 12 | Issue | 2 | Pages | 505 - 523 |
Keywords | |||||
Abstract | Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as “body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce, there is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ NCK2021 | Serial | 3657 | ||
Permanent link to this record | |||||
Author | Swathikiran Sudhakaran; Sergio Escalera;Oswald Lanz | ||||
Title | Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Pattern Analysis and Machine Intelligence | Abbreviated Journal | TPAMI |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ SEL2021 | Serial | 3656 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Hugo Bertiche; Sergio Escalera | ||||
Title | Deep unsupervised 3D human body reconstruction from a sparse set of landmarks | Type | Journal Article | ||
Year | 2021 | Publication | International Journal of Computer Vision | Abbreviated Journal | IJCV |
Volume | 129 | Issue | Pages | 2499–2512 | |
Keywords | |||||
Abstract | In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model that reconstructs body. Our set of proposed loss functions allows us to train the network in an unsupervised way. Results on four public datasets show that our approach accurately reconstructs the human body from real world mocap data. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ MBE2021 | Serial | 3654 | ||
Permanent link to this record | |||||
Author | Victor M. Campello; Polyxeni Gkontra; Cristian Izquierdo; Carlos Martin-Isla; Alireza Sojoudi; Peter M. Full; Klaus Maier-Hein; Yao Zhang; Zhiqiang He; Jun Ma; Mario Parreno; Alberto Albiol; Fanwei Kong; Shawn C. Shadden; Jorge Corral Acero; Vaanathi Sundaresan; Mina Saber; Mustafa Elattar; Hongwei Li; Bjoern Menze; Firas Khader; Christoph Haarburger; Cian M. Scannell; Mitko Veta; Adam Carscadden; Kumaradevan Punithakumar; Xiao Liu; Sotirios A. Tsaftaris; Xiaoqiong Huang; Xin Yang; Lei Li; Xiahai Zhuang; David Vilades; Martin L. Descalzo; Andrea Guala; Lucia La Mura; Matthias G. Friedrich; Ria Garg; Julie Lebel; Filipe Henriques; Mahir Karakas; Ersin Cavus; Steffen E. Petersen; Sergio Escalera; Santiago Segui; Jose F. Rodriguez Palomares; Karim Lekadir | ||||
Title | Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Transactions on Medical Imaging | Abbreviated Journal | TMI |
Volume | 40 | Issue | 12 | Pages | 3543-3554 |
Keywords | |||||
Abstract | The emergence of deep learning has considerably advanced the state-of-the-art in cardiac magnetic resonance (CMR) segmentation. Many techniques have been proposed over the last few years, bringing the accuracy of automated segmentation close to human performance. However, these models have been all too often trained and validated using cardiac imaging samples from single clinical centres or homogeneous imaging protocols. This has prevented the development and validation of models that are generalizable across different clinical centres, imaging conditions or scanner vendors. To promote further research and scientific benchmarking in the field of generalizable deep learning for cardiac segmentation, this paper presents the results of the Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation (M&Ms) Challenge, which was recently organized as part of the MICCAI 2020 Conference. A total of 14 teams submitted different solutions to the problem, combining various baseline models, data augmentation strategies, and domain adaptation techniques. The obtained results indicate the importance of intensity-driven data augmentation, as well as the need for further research to improve generalizability towards unseen scanner vendors or new imaging protocols. Furthermore, we present a new resource of 375 heterogeneous CMR datasets acquired by using four different scanner vendors in six hospitals and three different countries (Spain, Canada and Germany), which we provide as open-access for the community to enable future research in the field. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ CGI2021 | Serial | 3653 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Sergio Escalera; Xavier Baro; Jordi Gonzalez | ||||
Title | End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data | Type | Journal Article | ||
Year | 2022 | Publication | IET Computer Vision | Abbreviated Journal | IETCV |
Volume | 16 | Issue | 1 | Pages | 50-66 |
Keywords | Computer vision; data acquisition; human computer interaction; learning (artificial intelligence); pose estimation | ||||
Abstract | Despite recent advances in 3D pose estimation of human hands, especially thanks to the advent of CNNs and depth cameras, this task is still far from being solved. This is mainly due to the highly non-linear dynamics of fingers, which make hand model training a challenging task. In this paper, we exploit a novel hierarchical tree-like structured CNN, in which branches are trained to become specialized in predefined subsets of hand joints, called local poses. We further fuse local pose features, extracted from hierarchical CNN branches, to learn higher order dependencies among joints in the final pose by end-to-end training. Lastly, the loss function used is also defined to incorporate appearance and physical constraints about doable hand motion and deformation. Finally, we introduce a non-rigid data augmentation approach to increase the amount of training depth data. Experimental results suggest that feeding a tree-shaped CNN, specialized in local poses, into a fusion network for modeling joints correlations and dependencies, helps to increase the precision of final estimations, outperforming state-of-the-art results on NYU and SyntheticHand datasets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; ISE; 600.098; 600.119 | Approved | no | ||
Call Number | Admin @ si @ MEB2022 | Serial | 3652 | ||
Permanent link to this record |