Records |
Author |
Raquel Justo; Leila Ben Letaifa; Cristina Palmero; Eduardo Gonzalez-Fraile; Anna Torp Johansen; Alain Vazquez; Gennaro Cordasco; Stephan Schlogl; Begoña Fernandez-Ruanova; Micaela Silva; Sergio Escalera; Mikel de Velasco; Joffre Tenorio-Laranga; Anna Esposito; Maria Korsnes; M. Ines Torres |
Title |
Analysis of the Interaction between Elderly People and a Simulated Virtual Coach, Journal of Ambient Intelligence and Humanized Computing |
Type |
Journal Article |
Year |
2020 |
Publication |
Journal of Ambient Intelligence and Humanized Computing |
Abbreviated Journal |
AIHC |
Volume |
11 |
Issue |
12 |
Pages |
6125-6140 |
Keywords |
|
Abstract |
The EMPATHIC project develops and validates new interaction paradigms for personalized virtual coaches (VC) to promote healthy and independent aging. To this end, the work presented in this paper is aimed to analyze the interaction between the EMPATHIC-VC and the users. One of the goals of the project is to ensure an end-user driven design, involving senior users from the beginning and during each phase of the project. Thus, the paper focuses on some sessions where the seniors carried out interactions with a Wizard of Oz driven, simulated system. A coaching strategy based on the GROW model was used throughout these sessions so as to guide interactions and engage the elderly with the goals of the project. In this interaction framework, both the human and the system behavior were analyzed. The way the wizard implements the GROW coaching strategy is a key aspect of the system behavior during the interaction. The language used by the virtual agent as well as his or her physical aspect are also important cues that were analyzed. Regarding the user behavior, the vocal communication provides information about the speaker’s emotional status, that is closely related to human behavior and which can be extracted from the speech and language analysis. In the same way, the analysis of the facial expression, gazes and gestures can provide information on the non verbal human communication even when the user is not talking. In addition, in order to engage senior users, their preferences and likes had to be considered. To this end, the effect of the VC on the users was gathered by means of direct questionnaires. These analyses have shown a positive and calm behavior of users when interacting with the simulated virtual coach as well as some difficulties of the system to develop the proposed coaching strategy. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HuPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ JLP2020 |
Serial |
3443 |
Permanent link to this record |
|
|
|
Author |
Cristhian A. Aguilera-Carrasco; Cristhian Aguilera; Cristobal A. Navarro; Angel Sappa |
Title |
Fast CNN Stereo Depth Estimation through Embedded GPU Devices |
Type |
Journal Article |
Year |
2020 |
Publication |
Sensors |
Abbreviated Journal |
SENS |
Volume |
20 |
Issue |
11 |
Pages |
3249 |
Keywords |
stereo matching; deep learning; embedded GPU |
Abstract |
Current CNN-based stereo depth estimation models can barely run under real-time constraints on embedded graphic processing unit (GPU) devices. Moreover, state-of-the-art evaluations usually do not consider model optimization techniques, being that it is unknown what is the current potential on embedded GPU devices. In this work, we evaluate two state-of-the-art models on three different embedded GPU devices, with and without optimization methods, presenting performance results that illustrate the actual capabilities of embedded GPU devices for stereo depth estimation. More importantly, based on our evaluation, we propose the use of a U-Net like architecture for postprocessing the cost-volume, instead of a typical sequence of 3D convolutions, drastically augmenting the runtime speed of current models. In our experiments, we achieve real-time inference speed, in the range of 5–32 ms, for 1216 × 368 input stereo images on the Jetson TX2, Jetson Xavier, and Jetson Nano embedded devices. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MSIAU; 600.122 |
Approved |
no |
Call Number |
Admin @ si @ AAN2020 |
Serial |
3428 |
Permanent link to this record |
|
|
|
Author |
Hassan Ahmed Sial; Ramon Baldrich; Maria Vanrell |
Title |
Deep intrinsic decomposition trained on surreal scenes yet with realistic light effects |
Type |
Journal Article |
Year |
2020 |
Publication |
Journal of the Optical Society of America A |
Abbreviated Journal |
JOSA A |
Volume |
37 |
Issue |
1 |
Pages |
1-15 |
Keywords |
|
Abstract |
Estimation of intrinsic images still remains a challenging task due to weaknesses of ground-truth datasets, which either are too small or present non-realistic issues. On the other hand, end-to-end deep learning architectures start to achieve interesting results that we believe could be improved if important physical hints were not ignored. In this work, we present a twofold framework: (a) a flexible generation of images overcoming some classical dataset problems such as larger size jointly with coherent lighting appearance; and (b) a flexible architecture tying physical properties through intrinsic losses. Our proposal is versatile, presents low computation time, and achieves state-of-the-art results. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
CIC; 600.140; 600.12; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ SBV2019 |
Serial |
3311 |
Permanent link to this record |
|
|
|
Author |
Ajian Liu; Xuan Li; Jun Wan; Yanyan Liang; Sergio Escalera; Hugo Jair Escalante; Meysam Madadi; Yi Jin; Zhuoyuan Wu; Xiaogang Yu; Zichang Tan; Qi Yuan; Ruikun Yang; Benjia Zhou; Guodong Guo; Stan Z. Li |
Title |
Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review |
Type |
Journal Article |
Year |
2020 |
Publication |
IET Biometrics |
Abbreviated Journal |
BIO |
Volume |
10 |
Issue |
1 |
Pages |
24-43 |
Keywords |
|
Abstract |
Face anti-spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has %possessed achieved impressive progress recently due the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing. Recently, a multi-ethnic face anti-spoofing dataset, CASIA-SURF CeFA, has been released with the goal of measuring the ethnic bias. It is the largest up to date cross-ethnicity face anti-spoofing dataset covering 3 ethnicities, 3 modalities, 1,607 subjects, 2D plus 3D attack types, and the first dataset including explicit ethnic labels among the recently released datasets for face anti-spoofing. We organized the Chalearn Face Anti-spoofing Attack Detection Challenge which consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks around this novel resource to boost research aiming to alleviate the ethnic bias. Both tracks have attracted 340 teams in the development stage, and finally 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively. All the results were verified and re-ran by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results. We analyze the top ranked solutions and draw conclusions derived from the competition. In addition we outline future work directions. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ LLW2020b |
Serial |
3523 |
Permanent link to this record |
|
|
|
Author |
Guillermo Torres; Debora Gil |
Title |
A multi-shape loss function with adaptive class balancing for the segmentation of lung structures |
Type |
Journal Article |
Year |
2020 |
Publication |
International Journal of Computer Assisted Radiology and Surgery |
Abbreviated Journal |
IJCAR |
Volume |
15 |
Issue |
1 |
Pages |
S154-55 |
Keywords |
|
Abstract |
|
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
IAM |
Approved |
no |
Call Number |
Admin @ si @ ToG2020 |
Serial |
3590 |
Permanent link to this record |
|
|
|
Author |
Lei Kang; Marçal Rusiñol; Alicia Fornes; Pau Riba; Mauricio Villegas |
Title |
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition |
Type |
Conference Article |
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step. |
Address |
Aspen; Colorado; USA; March 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
DAG; 600.129; 600.140; 601.302; 601.312; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ KRF2020 |
Serial |
3446 |
Permanent link to this record |
|
|
|
Author |
Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas |
Title |
Exploring Hate Speech Detection in Multimodal Publications |
Type |
Conference Article |
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research. |
Address |
Aspen; March 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
Call Number |
Admin @ si @ GGG2020a |
Serial |
3280 |
Permanent link to this record |
|
|
|
Author |
Edgar Riba; D. Mishkin; Daniel Ponsa; E. Rublee; G. Bradski |
Title |
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch |
Type |
Conference Article |
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
|
Address |
Aspen; Colorado; USA; March 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
MSIAU; 600.122; 600.130 |
Approved |
no |
Call Number |
Admin @ si @ RMP2020 |
Serial |
3291 |
Permanent link to this record |
|
|
|
Author |
Cesar de Souza; Adrien Gaidon; Yohann Cabon; Naila Murray; Antonio Lopez |
Title |
Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models |
Type |
Journal Article |
Year |
2020 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
128 |
Issue |
|
Pages |
1505–1536 |
Keywords |
Procedural generation; Human action recognition; Synthetic data; Physics |
Abstract |
Deep video action recognition models have been highly successful in recent years but require large quantities of manually-annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation, physics models and other components of modern game engines. With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. PHAV contains a total of 39,982 videos, with more than 1000 examples for each of 35 action categories. Our video generation approach is not limited to existing motion capture sequences: 14 of these 35 categories are procedurally-defined synthetic actions. In addition, each video is represented with 6 different data modalities, including RGB, optical flow and pixel-level semantic labels. These modalities are generated almost simultaneously using the Multiple Render Targets feature of modern GPUs. In order to leverage PHAV, we introduce a deep multi-task (i.e. that considers action classes from multiple datasets) representation learning architecture that is able to simultaneously learn from synthetic and real video datasets, even when their action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance. Our approach also significantly outperforms video representations produced by fine-tuning state-of-the-art unsupervised generative models of videos. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.124; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ SGC2019 |
Serial |
3303 |
Permanent link to this record |
|
|
|
Author |
Ivet Rafegas; Maria Vanrell; Luis A Alexandre; G. Arias |
Title |
Understanding trained CNNs by indexing neuron selectivity |
Type |
Journal Article |
Year |
2020 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
Volume |
136 |
Issue |
|
Pages |
318-325 |
Keywords |
|
Abstract |
The impressive performance of Convolutional Neural Networks (CNNs) when solving different vision problems is shadowed by their black-box nature and our consequent lack of understanding of the representations they build and how these representations are organized. To help understanding these issues, we propose to describe the activity of individual neurons by their Neuron Feature visualization and quantify their inherent selectivity with two specific properties. We explore selectivity indexes for: an image feature (color); and an image label (class membership). Our contribution is a framework to seek or classify neurons by indexing on these selectivity properties. It helps to find color selective neurons, such as a red-mushroom neuron in layer Conv4 or class selective neurons such as dog-face neurons in layer Conv5 in VGG-M, and establishes a methodology to derive other selectivity properties. Indexing on neuron selectivity can statistically draw how features and classes are represented through layers in a moment when the size of trained nets is growing and automatic tools to index neurons can be helpful. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
CIC; 600.087; 600.140; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ RVL2019 |
Serial |
3310 |
Permanent link to this record |
|
|
|
Author |
Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger |
Title |
Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network |
Type |
Journal Article |
Year |
2020 |
Publication |
Automation in Construction |
Abbreviated Journal |
AC |
Volume |
110 |
Issue |
|
Pages |
102973 |
Keywords |
Semantic image segmentation; Deep learning |
Abstract |
Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HuPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ DMK2020 |
Serial |
3314 |
Permanent link to this record |
|
|
|
Author |
Sergio Escalera; Ralf Herbrich |
Title |
The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations |
Type |
Book Whole |
Year |
2020 |
Publication |
The Springer Series on Challenges in Machine Learning |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
Sergio Escalera; Ralf Hebrick |
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
2520-1328 |
ISBN |
978-3-030-29134-1 |
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HuPBA; no menciona |
Approved |
no |
Call Number |
Admin @ si @ HeE2020 |
Serial |
3328 |
Permanent link to this record |
|
|
|
Author |
Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas |
Title |
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features |
Type |
Conference Article |
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval. |
Address |
Aspen; Colorado; USA; March 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
Call Number |
Admin @ si @ MDB2020 |
Serial |
3334 |
Permanent link to this record |
|
|
|
Author |
Arnau Baro; Alicia Fornes; Carles Badal |
Title |
Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism |
Type |
Conference Article |
Year |
2020 |
Publication |
17th International Conference on Frontiers in Handwriting Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks. |
Address |
Virtual ICFHR; September 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICFHR |
Notes |
DAG; 600.140; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ BFB2020 |
Serial |
3448 |
Permanent link to this record |
|
|
|
Author |
Fei Yang; Yongmei Cheng; Joost Van de Weijer; Mikhail Mozerov |
Title |
Improved Discrete Optical Flow Estimation With Triple Image Matching Cost |
Type |
Journal Article |
Year |
2020 |
Publication |
IEEE Access |
Abbreviated Journal |
ACCESS |
Volume |
8 |
Issue |
|
Pages |
17093 - 17102 |
Keywords |
|
Abstract |
Approaches that use more than two consecutive video frames in the optical flow estimation have a long research history. However, almost all such methods utilize extra information for a pre-processing flow prediction or for a post-processing flow correction and filtering. In contrast, this paper differs from previously developed techniques. We propose a new algorithm for the likelihood function calculation (alternatively the matching cost volume) that is used in the maximum a posteriori estimation. We exploit the fact that in general, optical flow is locally constant in the sense of time and the likelihood function depends on both the previous and the future frame. Implementation of our idea increases the robustness of optical flow estimation. As a result, our method outperforms 9% over the DCFlow technique, which we use as prototype for our CNN based computation architecture, on the most challenging MPI-Sintel dataset for the non-occluded mask metric. Furthermore, our approach considerably increases the accuracy of the flow estimation for the matching cost processing, consequently outperforming the original DCFlow algorithm results up to 50% in occluded regions and up to 9% in non-occluded regions on the MPI-Sintel dataset. The experimental section shows that the proposed method achieves state-of-the-arts results especially on the MPI-Sintel dataset. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.120 |
Approved |
no |
Call Number |
Admin @ si @ YCW2020 |
Serial |
3345 |
Permanent link to this record |