|
Records |
Links |
|
Author |
Reuben Dorent; Aaron Kujawa; Marina Ivory; Spyridon Bakas; Nikola Rieke; Samuel Joutard; Ben Glocker; Jorge Cardoso; Marc Modat; Kayhan Batmanghelich; Arseniy Belkov; Maria Baldeon Calisto; Jae Won Choi; Benoit M. Dawant; Hexin Dong; Sergio Escalera; Yubo Fan; Lasse Hansen; Mattias P. Heinrich; Smriti Joshi; Victoriya Kashtanova; Hyeon Gyu Kim; Satoshi Kondo; Christian N. Kruse; Susana K. Lai-Yuen; Hao Li; Han Liu; Buntheng Ly; Ipek Oguz; Hyungseob Shin; Boris Shirokikh; Zixian Su; Guotai Wang; Jianghao Wu; Yanwu Xu; Kai Yao; Li Zhang; Sebastien Ourselin, |
|
|
Title |
CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation |
Type |
Journal Article |
|
Year |
2023 |
Publication |
Medical Image Analysis |
Abbreviated Journal |
MIA |
|
|
Volume |
83 |
Issue |
|
Pages |
102628 |
|
|
Keywords |
Domain Adaptation; Segmen tation; Vestibular Schwnannoma |
|
|
Abstract |
Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice – VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice – VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ DKI2023 |
Serial |
3706 |
|
Permanent link to this record |
|
|
|
|
Author |
Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez |
|
|
Title |
Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models |
Type |
Journal Article |
|
Year |
2023 |
Publication |
Sensors – Special Issue on “Machine Learning for Autonomous Driving Perception and Prediction” |
Abbreviated Journal |
SENS |
|
|
Volume |
23 |
Issue |
2 |
Pages |
621 |
|
|
Keywords |
Domain adaptation; semi-supervised learning; Semantic segmentation; Autonomous driving |
|
|
Abstract |
Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic
segmentation models. It consists of a self-training stage, which provides two domain-adapted models, and a model collaboration loop for the mutual improvement of these two models. These models are then used to provide the final semantic segmentation labels (pseudo-labels) for the real-world images. The overall
procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for on-board semantic segmentation. Our
procedure shows improvements ranging from ∼13 to ∼26 mIoU points over baselines, so establishing new state-of-the-art results. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ GVL2023 |
Serial |
3705 |
|
Permanent link to this record |
|
|
|
|
Author |
Alex Gomez-Villa; Bartlomiej Twardowski; Lu Yu; Andrew Bagdanov; Joost Van de Weijer |
|
|
Title |
Continually Learning Self-Supervised Representations With Projected Functional Regularization |
Type |
Conference Article |
|
Year |
2022 |
Publication |
CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
3866-3876 |
|
|
Keywords |
Computer vision; Conferences; Self-supervised learning; Image representation; Pattern recognition |
|
|
Abstract |
Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised approaches. However, these methods are unable to acquire new knowledge incrementally – they are, in fact, mostly used only as a pre-training phase over IID data. In this work we investigate self-supervised methods in continual learning regimes without any replay
mechanism. We show that naive functional regularization,also known as feature distillation, leads to lower plasticity and limits continual learning performance. Instead, we propose Projected Functional Regularization in which a separate temporal projection network ensures that the newly learned feature space preserves information of the previous one, while at the same time allowing for the learning of new features. This prevents forgetting while maintaining the plasticity of the learner. Comparison with other incremental learning approaches applied to self-supervision demonstrates that our method obtains competitive performance in
different scenarios and on multiple datasets. |
|
|
Address |
New Orleans, USA; 20 June 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPRW |
|
|
Notes |
LAMP: 600.147; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GTY2022 |
Serial |
3704 |
|
Permanent link to this record |
|
|
|
|
Author |
Javad Zolfaghari Bengar; Joost Van de Weijer; Laura Lopez-Fuentes; Bogdan Raducanu |
|
|
Title |
Class-Balanced Active Learning for Image Classification |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Active learning aims to reduce the labeling effort that is required to train algorithms by learning an acquisition function selecting the most relevant data for which a label should be requested from a large unlabeled data pool. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called long-tail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we proposed a general optimization framework that explicitly takes class-balancing into account. Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we showed that also on balanced datasets
our method 1 generally results in a performance gain. |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
LAMP; 602.200; 600.147; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ ZWL2022 |
Serial |
3703 |
|
Permanent link to this record |
|
|
|
|
Author |
Juan Borrego-Carazo; Carles Sanchez; David Castells; Jordi Carrabina; Debora Gil |
|
|
Title |
BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation |
Type |
Journal Article |
|
Year |
2023 |
Publication |
Computer Methods and Programs in Biomedicine |
Abbreviated Journal |
CMPB |
|
|
Volume |
228 |
Issue |
|
Pages |
107241 |
|
|
Keywords |
Videobronchoscopy guiding; Deep learning; Architecture optimization; Datasets; Standardized evaluation framework; Pose estimation |
|
|
Abstract |
Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the video bronchoscopy to provide effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions.
In the present paper, we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standardized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state-of-the-art camera pose estimation in video bronchoscopy. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Elsevier |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; |
Approved |
no |
|
|
Call Number |
Admin @ si @ BSC2023 |
Serial |
3702 |
|
Permanent link to this record |
|
|
|
|
Author |
Y. Mori; M.Misawa; Jorge Bernal; M. Bretthauer; S.Kudo; A. Rastogi; Gloria Fernandez Esparrach |
|
|
Title |
Artificial Intelligence for Disease Diagnosis-the Gold Standard Challenge |
Type |
Journal Article |
|
Year |
2022 |
Publication |
Gastrointestinal Endoscopy |
Abbreviated Journal |
|
|
|
Volume |
96 |
Issue |
2 |
Pages |
370-372 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE |
Approved |
no |
|
|
Call Number |
Admin @ si @ MMB2022 |
Serial |
3701 |
|
Permanent link to this record |
|
|
|
|
Author |
Bojana Gajic; Ariel Amato; Ramon Baldrich; Joost Van de Weijer; Carlo Gatta |
|
|
Title |
Area Under the ROC Curve Maximization for Metric Learning |
Type |
Conference Article |
|
Year |
2022 |
Publication |
CVPR 2022 Workshop on Efficien Deep Learning for Computer Vision (ECV 2022, 5th Edition) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Training; Computer vision; Conferences; Area measurement; Benchmark testing; Pattern recognition |
|
|
Abstract |
Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification. |
|
|
Address |
New Orleans, USA; 20 June 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPRW |
|
|
Notes |
CIC; LAMP; |
Approved |
no |
|
|
Call Number |
Admin @ si @ GAB2022 |
Serial |
3700 |
|
Permanent link to this record |
|
|
|
|
Author |
Guillermo Torres; Sonia Baeza; Carles Sanchez; Ignasi Guasch; Antoni Rosell; Debora Gil |
|
|
Title |
An Intelligent Radiomic Approach for Lung Cancer Screening |
Type |
Journal Article |
|
Year |
2022 |
Publication |
Applied Sciences |
Abbreviated Journal |
APPLSCI |
|
|
Volume |
12 |
Issue |
3 |
Pages |
1568 |
|
|
Keywords |
Lung cancer; Early diagnosis; Screening; Neural networks; Image embedding; Architecture optimization |
|
|
Abstract |
The efficiency of lung cancer screening for reducing mortality is hindered by the high rate of false positives. Artificial intelligence applied to radiomics could help to early discard benign cases from the analysis of CT scans. The available amount of data and the fact that benign cases are a minority, constitutes a main challenge for the successful use of state of the art methods (like deep learning), which can be biased, over-fitted and lack of clinical reproducibility. We present an hybrid approach combining the potential of radiomic features to characterize nodules in CT scans and the generalization of the feed forward networks. In order to obtain maximal reproducibility with minimal training data, we propose an embedding of nodules based on the statistical significance of radiomic features for malignancy detection. This representation space of lesions is the input to a feed
forward network, which architecture and hyperparameters are optimized using own-defined metrics of the diagnostic power of the whole system. Results of the best model on an independent set of patients achieve 100% of sensitivity and 83% of specificity (AUC = 0.94) for malignancy detection. |
|
|
Address |
Jan 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; 600.139; 600.145 |
Approved |
no |
|
|
Call Number |
Admin @ si @ TBS2022 |
Serial |
3699 |
|
Permanent link to this record |
|
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Vassilis Athitsos; Mohammad Sabokrou |
|
|
Title |
All You Need In Sign Language Production |
Type |
Miscellaneous |
|
Year |
2022 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Sign Language Production; Sign Language Recog- nition; Sign Language Translation; Deep Learning; Survey; Deaf |
|
|
Abstract |
Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental.
To this end, sign language recognition and production are two necessary parts for making such a two-way system. Signlanguage recognition and production need to cope with some critical challenges. In this survey, we review recent advances in
Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2022c |
Serial |
3698 |
|
Permanent link to this record |
|
|
|
|
Author |
Miquel Angel Piera; Jose Luis Muñoz; Debora Gil; Gonzalo Martin; Jordi Manzano |
|
|
Title |
A Socio-Technical Simulation Model for the Design of the Future Single Pilot Cockpit: An Opportunity to Improve Pilot Performance |
Type |
Journal Article |
|
Year |
2022 |
Publication |
IEEE Access |
Abbreviated Journal |
ACCESS |
|
|
Volume |
10 |
Issue |
|
Pages |
22330-22343 |
|
|
Keywords |
Human factors ; Performance evaluation ; Simulation; Sociotechnical systems ; System performance |
|
|
Abstract |
The future deployment of single pilot operations must be supported by new cockpit computer services. Such services require an adaptive context-aware integration of technical functionalities with the concurrent tasks that a pilot must deal with. Advanced artificial intelligence supporting services and improved communication capabilities are the key enabling technologies that will render future cockpits more integrated with the present digitalized air traffic management system. However, an issue in the integration of such technologies is the lack of socio-technical analysis in the design of these teaming mechanisms. A key factor in determining how and when a service support should be provided is the dynamic evolution of pilot workload. This paper investigates how the socio-technical model-based systems engineering approach paves the way for the design of a digital assistant framework by formalizing this workload. The model was validated in an Airbus A-320 cockpit simulator, and the results confirmed the degraded pilot behavioral model and the performance impact according to different contextual flight deck information. This study contributes to practical knowledge for designing human-machine task-sharing systems. |
|
|
Address |
Feb 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; |
Approved |
no |
|
|
Call Number |
Admin @ si @ PMG2022 |
Serial |
3697 |
|
Permanent link to this record |
|
|
|
|
Author |
David Berga; Xavier Otazu |
|
|
Title |
A neurodynamic model of saliency prediction in v1 |
Type |
Journal Article |
|
Year |
2022 |
Publication |
Neural Computation |
Abbreviated Journal |
NEURALCOMPUT |
|
|
Volume |
34 |
Issue |
2 |
Pages |
378-414 |
|
|
Keywords |
|
|
|
Abstract |
Lateral connections in the primary visual cortex (V1) have long been hypothesized to be responsible for several visual processing mechanisms such as brightness induction, chromatic induction, visual discomfort, and bottom-up visual attention (also named saliency). Many computational models have been developed to independently predict these and other visual processes, but no computational model has been able to reproduce all of them simultaneously. In this work, we show that a biologically plausible computational model of lateral interactions of V1 is able to simultaneously predict saliency and all the aforementioned visual processes. Our model's architecture (NSWAM) is based on Penacchio's neurodynamic model of lateral connections of V1. It is defined as a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation, and scale. We tested NSWAM saliency predictions using images from several eye tracking data sets. We show that the accuracy of predictions obtained by our architecture, using shuffled metrics, is similar to other state-of-the-art computational methods, particularly with synthetic images (CAT2000-Pattern and SID4VAM) that mainly contain low-level features. Moreover, we outperform other biologically inspired saliency models that are specifically designed to exclusively reproduce saliency. We show that our biologically plausible model of lateral connections can simultaneously explain different visual processes present in V1 (without applying any type of training or optimization and keeping the same parameterization for all the visual processes). This can be useful for the definition of a unified architecture of the primary visual cortex. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
NEUROBIT; 600.128; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BeO2022 |
Serial |
3696 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Brugues Pujolras; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
A Multilingual Approach to Scene Text Visual Question Answering |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Document Analysis Systems.15th IAPR International Workshop, (DAS2022) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
65-79 |
|
|
Keywords |
Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning |
|
|
Abstract |
Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines. |
|
|
Address |
La Rochelle, France; May 22–25, 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG; 611.004; 600.155; 601.002 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGK2022b |
Serial |
3695 |
|
Permanent link to this record |
|
|
|
|
Author |
Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados |
|
|
Title |
A Generic Image Retrieval Method for Date Estimation of Historical Document Collections |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Document Analysis Systems.15th IAPR International Workshop, (DAS2022) |
Abbreviated Journal |
|
|
|
Volume |
13237 |
Issue |
|
Pages |
583–597 |
|
|
Keywords |
Date estimation; Document retrieval; Image retrieval; Ranking loss; Smooth-nDCG |
|
|
Abstract |
Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images. |
|
|
Address |
La Rochelle, France; May 22–25, 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG; 600.140; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MGR2022 |
Serial |
3694 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ramzy Ibrahim; Robert Benavente; Felipe Lumbreras; Daniel Ponsa |
|
|
Title |
3DRRDB: Super Resolution of Multiple Remote Sensing Images using 3D Residual in Residual Dense Blocks |
Type |
Conference Article |
|
Year |
2022 |
Publication |
CVPR 2022 Workshop on IEEE Perception Beyond the Visible Spectrum workshop series (PBVS, 18th Edition) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Training; Solid modeling; Three-dimensional displays; PSNR; Convolution; Superresolution; Pattern recognition |
|
|
Abstract |
The rapid advancement of Deep Convolutional Neural Networks helped in solving many remote sensing problems, especially the problems of super-resolution. However, most state-of-the-art methods focus more on Single Image Super-Resolution neglecting Multi-Image Super-Resolution. In this work, a new proposed 3D Residual in Residual Dense Blocks model (3DRRDB) focuses on remote sensing Multi-Image Super-Resolution for two different single spectral bands. The proposed 3DRRDB model explores the idea of 3D convolution layers in deeply connected Dense Blocks and the effect of local and global residual connections with residual scaling in Multi-Image Super-Resolution. The model tested on the Proba-V challenge dataset shows a significant improvement above the current state-of-the-art models scoring a Corrected Peak Signal to Noise Ratio (cPSNR) of 48.79 dB and 50.83 dB for Near Infrared (NIR) and RED Bands respectively. Moreover, the proposed 3DRRDB model scores a Corrected Structural Similarity Index Measure (cSSIM) of 0.9865 and 0.9909 for NIR and RED bands respectively. |
|
|
Address |
New Orleans, USA; 19 June 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPRW |
|
|
Notes |
MSIAU; 600.130 |
Approved |
no |
|
|
Call Number |
Admin @ si @ IBL2022 |
Serial |
3693 |
|
Permanent link to this record |
|
|
|
|
Author |
Wenjuan Gong; Zhang Yue; Wei Wang; Cheng Peng; Jordi Gonzalez |
|
|
Title |
Meta-MMFNet: Meta-Learning Based Multi-Model Fusion Network for Micro-Expression Recognition |
Type |
Journal Article |
|
Year |
2022 |
Publication |
ACM Transactions on Multimedia Computing, Communications, and Applications |
Abbreviated Journal |
ACMTMC |
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Feature Fusion; Model Fusion; Meta-Learning; Micro-Expression Recognition |
|
|
Abstract |
Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method. |
|
|
Address |
May 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ISE; 600.157 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GYW2022 |
Serial |
3692 |
|
Permanent link to this record |