|
Records |
Links |
|
Author |
Arnau Baro; Carles Badal; Pau Torras; Alicia Fornes |
|
|
Title |
Handwritten Historical Music Recognition through Sequence-to-Sequence with Attention Mechanism |
Type |
Conference Article |
|
Year |
2022 |
Publication |
3rd International Workshop on Reading Music Systems (WoRMS2021) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
55-59 |
|
|
Keywords |
Optical Music Recognition; Digits; Image Classification |
|
|
Abstract |
Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks. |
|
|
Address |
July 23, 2021, Alicante (Spain) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WoRMS |
|
|
Notes |
DAG; 600.121; 600.162; 602.230; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BBT2022 |
Serial |
3734 |
|
Permanent link to this record |
|
|
|
|
Author |
Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang |
|
|
Title |
Improving Handwritten Music Recognition through Language Model Integration |
Type |
Conference Article |
|
Year |
2022 |
Publication |
4th International Workshop on Reading Music Systems (WoRMS2022) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
42-46 |
|
|
Keywords |
optical music recognition; historical sources; diversity; music theory; digital humanities |
|
|
Abstract |
Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature. |
|
|
Address |
November 18, 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WoRMS |
|
|
Notes |
DAG; 600.121; 600.162; 602.230 |
Approved |
no |
|
|
Call Number |
Admin @ si @ TBF2022 |
Serial |
3735 |
|
Permanent link to this record |
|
|
|
|
Author |
Henry Velesaca; Patricia Suarez; Angel Sappa; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez |
|
|
Title |
Review on Common Techniques for Urban Environment Video Analytics |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Anais do III Workshop Brasileiro de Cidades Inteligentes |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
107-118 |
|
|
Keywords |
Video Analytics; Review; Urban Environments; Smart Cities |
|
|
Abstract |
This work compiles the different computer vision-based approaches
from the state-of-the-art intended for video analytics in urban environments.
The manuscript groups the different approaches according to the typical modules present in video analysis, including image preprocessing, object detection,
classification, and tracking. This proposed pipeline serves as a basic guide to
representing these most representative approaches in this topic of video analysis
that will be addressed in this work. Furthermore, the manuscript is not intended
to be an exhaustive review of the most advanced approaches, but only a list of
common techniques proposed to address recurring problems in this field. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WBCI |
|
|
Notes |
MSIAU; 601.349 |
Approved |
no |
|
|
Call Number |
Admin @ si @ VSS2022 |
Serial |
3773 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Ali Furkan Biten; Sounak Dey; Alicia Fornes; Yousri Kessentini; Lluis Gomez; Dimosthenis Karatzas; Josep Llados |
|
|
Title |
One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Document Analysis |
|
|
Abstract |
Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). This appears, for example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the content. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method, achieving competitive results compared to the usage of real annotated data. |
|
|
Address |
Virtual; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 602.230; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBD2022 |
Serial |
3615 |
|
Permanent link to this record |
|
|
|
|
Author |
Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar |
|
|
Title |
InfographicVQA |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1697-1706 |
|
|
Keywords |
Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages |
|
|
Abstract |
Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.155 |
Approved |
no |
|
|
Call Number |
MBT2022 |
Serial |
3625 |
|
Permanent link to this record |
|
|
|
|
Author |
Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund |
|
|
Title |
Multi-Task Classification of Sewer Pipe Defects and Properties Using a Cross-Task Graph Neural Network Decoder |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2806-2817 |
|
|
Keywords |
Vision Systems; Applications Multi-Task Classification |
|
|
Abstract |
The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer
defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and pipe shape, which are needed to evaluate the level of sewer pipe deterioration.
In this work we classify sewer pipe defects and properties concurrently and present a novel decoder-focused multi-task classification architecture Cross-Task Graph Neural Network (CT-GNN), which refines the disjointed per-task predictions using cross-task information. The CT-GNN architecture extends the traditional disjointed task-heads decoder, by utilizing a cross-task graph and unique class node embeddings. The cross-task graph can either be determined a priori based on the conditional probability between the task classes or determined dynamically using self-attention.
CT-GNN can be added to any backbone and trained end-toend at a small increase in the parameter count. We achieve state-of-the-art performance on all four classification tasks in the Sewer-ML dataset, improving defect classification and
water level classification by 5.3 and 8.0 percentage points, respectively. We also outperform the single task methods as well as other multi-task classification approaches while introducing 50 times fewer parameters than previous modelfocused approaches. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
HUPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ BME2022 |
Serial |
3638 |
|
Permanent link to this record |
|
|
|
|
Author |
Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1381-1390 |
|
|
Keywords |
Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data |
|
|
Abstract |
Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online. |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.155; 302.105 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGK2022 |
Serial |
3662 |
|
Permanent link to this record |
|
|
|
|
Author |
Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1391-1400 |
|
|
Keywords |
Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning |
|
|
Abstract |
The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin. |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.155; 302.105; |
Approved |
no |
|
|
Call Number |
Admin @ si @ BMG2022 |
Serial |
3663 |
|
Permanent link to this record |
|
|
|
|
Author |
Javad Zolfaghari Bengar; Joost Van de Weijer; Laura Lopez-Fuentes; Bogdan Raducanu |
|
|
Title |
Class-Balanced Active Learning for Image Classification |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Active learning aims to reduce the labeling effort that is required to train algorithms by learning an acquisition function selecting the most relevant data for which a label should be requested from a large unlabeled data pool. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called long-tail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we proposed a general optimization framework that explicitly takes class-balancing into account. Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we showed that also on balanced datasets
our method 1 generally results in a performance gain. |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
LAMP; 602.200; 600.147; 600.120 |
Approved |
no |
|
|
Call Number |
Admin @ si @ ZWL2022 |
Serial |
3703 |
|
Permanent link to this record |
|
|
|
|
Author |
Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla |
|
|
Title |
Thermal Image Super-Resolution: A Novel Unsupervised Approach |
Type |
Conference Article |
|
Year |
2022 |
Publication |
International Joint Conference on Computer Vision, Imaging and Computer Graphics |
Abbreviated Journal |
|
|
|
Volume |
1474 |
Issue |
|
Pages |
495–506 |
|
|
Keywords |
|
|
|
Abstract |
This paper proposes the use of a CycleGAN architecture for thermal image super-resolution under a transfer domain strategy, where middle-resolution images from one camera are transferred to a higher resolution domain of another camera. The proposed approach is trained with a large dataset acquired using three thermal cameras at different resolutions. An unsupervised learning process is followed to train the architecture. Additional loss function is proposed trying to improve results from the state of the art approaches. Following the first thermal image super-resolution challenge (PBVS-CVPR2020) evaluations are performed. A comparison with previous works is presented showing the proposed approach reaches the best results. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
VISIGRAPP |
|
|
Notes |
MSIAU; 600.130 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RSV2022d |
Serial |
3776 |
|
Permanent link to this record |
|
|
|
|
Author |
Jorge Charco; Angel Sappa; Boris X. Vintimilla |
|
|
Title |
Human Pose Estimation through a Novel Multi-view Scheme |
Type |
Conference Article |
|
Year |
2022 |
Publication |
17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) |
Abbreviated Journal |
|
|
|
Volume |
5 |
Issue |
|
Pages |
855-862 |
|
|
Keywords |
Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach |
|
|
Abstract |
This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a
multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and
comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations. |
|
|
Address |
On line; Feb 6, 2022 – Feb 8, 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
2184-4321 |
ISBN |
978-989-758-555-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
VISAPP |
|
|
Notes |
MSIAU; 600.160 |
Approved |
no |
|
|
Call Number |
Admin @ si @ CSV2022 |
Serial |
3689 |
|
Permanent link to this record |
|
|
|
|
Author |
Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla |
|
|
Title |
Multi-Image Super-Resolution for Thermal Images |
Type |
Conference Article |
|
Year |
2022 |
Publication |
17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) |
Abbreviated Journal |
|
|
|
Volume |
4 |
Issue |
|
Pages |
635-642 |
|
|
Keywords |
Thermal Images; Multi-view; Multi-frame; Super-Resolution; Deep Learning; Attention Block |
|
|
Abstract |
This paper proposes a novel CNN architecture for the multi-thermal image super-resolution problem. In the proposed scheme, the multi-images are synthetically generated by downsampling and slightly shifting the given image; noise is also added to each of these synthesized images. The proposed architecture uses two
attention blocks paths to extract high-frequency details taking advantage of the large information extracted from multiple images of the same scene. Experimental results are provided, showing the proposed scheme has overcome the state-of-the-art approaches. |
|
|
Address |
Online; Feb 6-8, 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
VISAPP |
|
|
Notes |
MSIAU; 601.349 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RSV2022a |
Serial |
3690 |
|
Permanent link to this record |
|
|
|
|
Author |
Bhalaji Nagarajan; Ricardo Marques; Marcos Mejia; Petia Radeva |
|
|
Title |
Class-conditional Importance Weighting for Deep Learning with Noisy Labels |
Type |
Conference Article |
|
Year |
2022 |
Publication |
17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
Abbreviated Journal |
|
|
|
Volume |
5 |
Issue |
|
Pages |
679-686 |
|
|
Keywords |
Noisy Labeling; Loss Correction; Class-conditional Importance Weighting; Learning with Noisy Labels |
|
|
Abstract |
Large-scale accurate labels are very important to the Deep Neural Networks to train them and assure high performance. However, it is very expensive to create a clean dataset since usually it relies on human interaction. To this purpose, the labelling process is made cheap with a trade-off of having noisy labels. Learning with Noisy Labels is an active area of research being at the same time very challenging. The recent advances in Self-supervised learning and robust loss functions have helped in advancing noisy label research. In this paper, we propose a loss correction method that relies on dynamic weights computed based on the model training. We extend the existing Contrast to Divide algorithm coupled with DivideMix using a new class-conditional weighted scheme. We validate the method using the standard noise experiments and achieved encouraging results. |
|
|
Address |
Virtual; February 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
VISAPP |
|
|
Notes |
MILAB; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ NMM2022 |
Serial |
3798 |
|
Permanent link to this record |
|
|
|
|
Author |
Vishwesh Pillai; Pranav Mehar; Manisha Das; Deep Gupta; Petia Radeva |
|
|
Title |
Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty |
Type |
Conference Article |
|
Year |
2022 |
Publication |
IEEE International Conference on Signal Processing and Communications |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples. |
|
|
Address |
Bangalore; India; July 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
SPCOM |
|
|
Notes |
MILAB; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ PMD2022 |
Serial |
3796 |
|
Permanent link to this record |
|
|
|
|
Author |
Patricia Suarez; Dario Carpio; Angel Sappa; Henry Velesaca |
|
|
Title |
Transformer based Image Dehazing |
Type |
Conference Article |
|
Year |
2022 |
Publication |
16th IEEE International Conference on Signal Image Technology & Internet Based System |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
atmospheric light; brightness component; computational cost; dehazing quality; haze-free image |
|
|
Abstract |
This paper presents a novel approach to remove non homogeneous haze from real images. The proposed method consists mainly of image feature extraction, haze removal, and image reconstruction. To accomplish this challenging task, we propose an architecture based on transformers, which have been recently introduced and have shown great potential in different computer vision tasks. Our model is based on the SwinIR an image restoration architecture based on a transformer, but by modifying the deep feature extraction module, the depth level of the model, and by applying a combined loss function that improves styling and adapts the model for the non-homogeneous haze removal present in images. The obtained results prove to be superior to those obtained by state-of-the-art models. |
|
|
Address |
Dijon; France; October 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
SITIS |
|
|
Notes |
MSIAU; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ SCS2022 |
Serial |
3803 |
|
Permanent link to this record |