|   | 
Details
   web
Records
Author Giuseppe Pezzano; Oliver Diaz; Vicent Ribas Ripoll; Petia Radeva
Title CoLe-CNN+: Context learning – Convolutional neural network for COVID-19-Ground-Glass-Opacities detection and segmentation Type Journal Article
Year 2021 Publication Computers in Biology and Medicine Abbreviated Journal CBM
Volume 136 Issue Pages 104689
Keywords
Abstract The most common tool for population-wide COVID-19 identification is the Reverse Transcription-Polymerase Chain Reaction test that detects the presence of the virus in the throat (or sputum) in swab samples. This test has a sensitivity between 59% and 71%. However, this test does not provide precise information regarding the extension of the pulmonary infection. Moreover, it has been proven that through the reading of a computed tomography (CT) scan, a clinician can provide a more complete perspective of the severity of the disease. Therefore, we propose a comprehensive system for fully-automated COVID-19 detection and lesion segmentation from CT scans, powered by deep learning strategies to support decision-making process for the diagnosis of COVID-19.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ PDR2021 Serial (down) 3635
Permanent link to this record
 

 
Author Andreea Glavan; Alina Matei; Petia Radeva; Estefania Talavera
Title Does our social life influence our nutritional behaviour? Understanding nutritional habits from egocentric photo-streams Type Journal Article
Year 2021 Publication Expert Systems with Applications Abbreviated Journal ESWA
Volume 171 Issue Pages 114506
Keywords
Abstract Nutrition and social interactions are both key aspects of the daily lives of humans. In this work, we propose a system to evaluate the influence of social interaction in the nutritional habits of a person from a first-person perspective. In order to detect the routine of an individual, we construct a nutritional behaviour pattern discovery model, which outputs routines over a number of days. Our method evaluates similarity of routines with respect to visited food-related scenes over the collected days, making use of Dynamic Time Warping, as well as considering social engagement and its correlation with food-related activities. The nutritional and social descriptors of the collected days are evaluated and encoded using an LSTM Autoencoder. Later, the obtained latent space is clustered to find similar days unaffected by outliers using the Isolation Forest method. Moreover, we introduce a new score metric to evaluate the performance of the proposed algorithm. We validate our method on 104 days and more than 100 k egocentric images gathered by 7 users. Several different visualizations are evaluated for the understanding of the findings. Our results demonstrate good performance and applicability of our proposed model for social-related nutritional behaviour understanding. At the end, relevant applications of the model are discussed by analysing the discovered routine of particular individuals.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no proj Approved no
Call Number Admin @ si @ GMR2021 Serial (down) 3634
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Hatem A. Rashwan; Farhan Akram; Vivek Kumar Singh; Syeda Furruka Banu; Forhad U H Chowdhury; Kabir Ahmed Choudhury; Sylvie Chambon; Petia Radeva; Domenec Puig; Mohamed Abdel-Nasser
Title SLSNet: Skin lesion segmentation using a lightweight generative adversarial network Type Journal Article
Year 2021 Publication Expert Systems With Applications Abbreviated Journal ESWA
Volume 183 Issue Pages 115433
Keywords
Abstract The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no proj Approved no
Call Number Admin @ si @ SRA2021 Serial (down) 3633
Permanent link to this record
 

 
Author Manisha Das; Deep Gupta; Petia Radeva; Ashwini M. Bakde
Title Multi-scale decomposition-based CT-MR neurological image fusion using optimized bio-inspired spiking neural model with meta-heuristic optimization Type Journal Article
Year 2021 Publication International Journal of Imaging Systems and Technology Abbreviated Journal IMA
Volume 31 Issue 4 Pages 2170-2188
Keywords
Abstract Multi-modal medical image fusion plays an important role in clinical diagnosis and works as an assistance model for clinicians. In this paper, a computed tomography-magnetic resonance (CT-MR) image fusion model is proposed using an optimized bio-inspired spiking feedforward neural network in different decomposition domains. First, source images are decomposed into base (low-frequency) and detail (high-frequency) layer components. Low-frequency subbands are fused using texture energy measures to capture the local energy, contrast, and small edges in the fused image. High-frequency coefficients are fused using firing maps obtained by pixel-activated neural model with the optimized parameters using three different optimization techniques such as differential evolution, cuckoo search, and gray wolf optimization, individually. In the optimization model, a fitness function is computed based on the edge index of resultant fused images, which helps to extract and preserve sharp edges available in the source CT and MR images. To validate the fusion performance, a detailed comparative analysis is presented among the proposed and state-of-the-art methods in terms of quantitative and qualitative measures along with computational complexity. Experimental results show that the proposed method produces a significantly better visual quality of fused images meanwhile outperforms the existing methods.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ DGR2021a Serial (down) 3630
Permanent link to this record
 

 
Author Ahmed M. A. Salih; Ilaria Boscolo Galazzo; Zahra Zahra Raisi-Estabragh; Steffen E. Petersen; Polyxeni Gkontra; Karim Lekadir; Gloria Menegaz; Petia Radeva
Title A new scheme for the assessment of the robustness of Explainable Methods Applied to Brain Age estimation Type Conference Article
Year 2021 Publication 34th International Symposium on Computer-Based Medical Systems Abbreviated Journal
Volume Issue Pages 492-497
Keywords
Abstract Deep learning methods show great promise in a range of settings including the biomedical field. Explainability of these models is important in these fields for building end-user trust and to facilitate their confident deployment. Although several Machine Learning Interpretability tools have been proposed so far, there is currently no recognized evaluation standard to transfer the explainability results into a quantitative score. Several measures have been proposed as proxies for quantitative assessment of explainability methods. However, the robustness of the list of significant features provided by the explainability methods has not been addressed. In this work, we propose a new proxy for assessing the robustness of the list of significant features provided by two explainability methods. Our validation is defined at functionality-grounded level based on the ranked correlation statistical index and demonstrates its successful application in the framework of brain aging estimation. We assessed our proxy to estimate brain age using neuroscience data. Our results indicate small variability and high robustness in the considered explainability methods using this new proxy.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CBMS
Notes MILAB; no proj Approved no
Call Number Admin @ si @ SBZ2021 Serial (down) 3629
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz; Kadi Bouatouch
Title Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – (Volume 5) Type Book Whole
Year 2021 Publication Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – VISIGRAPP 2021 Abbreviated Journal
Volume 5 Issue Pages
Keywords
Abstract This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISIGRAPP
Notes MILAB Approved no
Call Number Admin @ si @ FRB2021b Serial (down) 3628
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz; Kadi Bouatouch
Title Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 4) Type Book Whole
Year 2021 Publication Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021 Abbreviated Journal
Volume 4 Issue Pages
Keywords
Abstract This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISIGRAPP
Notes MILAB Approved no
Call Number Admin @ si @ FRB2021a Serial (down) 3627
Permanent link to this record
 

 
Author Alejandro Cartas; Petia Radeva; Mariella Dimiccoli
Title Modeling long-term interactions to enhance action recognition Type Conference Article
Year 2021 Publication 25th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages 10351-10358
Keywords
Abstract In this paper, we propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. At the frame level, we use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects and calculates the action score through a CNN formulation. This information is then fed to a Hierarchical LongShort-Term Memory Network (HLSTM) that captures temporal dependencies between actions within and across shots. Ablation studies thoroughly validate the proposed approach, showing in particular that both levels of the HLSTM architecture contribute to performance improvement. Furthermore, quantitative comparisons show that the proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks,without relying on motion information
Address January 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes MILAB; Approved no
Call Number Admin @ si @ CRD2021 Serial (down) 3626
Permanent link to this record
 

 
Author Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
Title ICDAR 2021 Competition on Document Visual Question Answering Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 635-649
Keywords
Abstract In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.
Address VIRTUAL; Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ TMJ2021 Serial (down) 3624
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Document Collection Visual Question Answering Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 12822 Issue Pages 778-792
Keywords Document collection; Visual Question Answering
Abstract Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ TKV2021 Serial (down) 3622
Permanent link to this record
 

 
Author Minesh Mathew; Lluis Gomez; Dimosthenis Karatzas; C.V. Jawahar
Title Asking questions on handwritten document collections Type Journal Article
Year 2021 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 24 Issue Pages 235-249
Keywords
Abstract This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ MGK2021 Serial (down) 3621
Permanent link to this record
 

 
Author Lluis Gomez; Ali Furkan Biten; Ruben Tito; Andres Mafla; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title Multimodal grid features and cell pointers for scene text visual question answering Type Journal Article
Year 2021 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 150 Issue Pages 242-249
Keywords
Abstract This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.084; 600.121 Approved no
Call Number Admin @ si @ GBT2021 Serial (down) 3620
Permanent link to this record
 

 
Author Pau Torras; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes
Title A Transcription Is All You Need: Learning to Align through Attention Type Conference Article
Year 2021 Publication 14th IAPR International Workshop on Graphics Recognition Abbreviated Journal
Volume 12916 Issue Pages 141–146
Keywords
Abstract Historical ciphered manuscripts are a type of document where graphical symbols are used to encrypt their content instead of regular text. Nowadays, expert transcriptions can be found in libraries alongside the corresponding manuscript images. However, those transcriptions are not aligned, so these are barely usable for training deep learning-based recognition methods. To solve this issue, we propose a method to align each symbol in the transcript of an image with its visual representation by using an attention-based Sequence to Sequence (Seq2Seq) model. The core idea is that, by learning to recognise symbols sequence within a cipher line image, the model also identifies their position implicitly through an attention mechanism. Thus, the resulting symbol segmentation can be later used for training algorithms. The experimental evaluation shows that this method is promising, especially taking into account the small size of the cipher dataset.
Address Virtual; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference GREC
Notes DAG; 602.230; 600.140; 600.121 Approved no
Call Number Admin @ si @ TSC2021 Serial (down) 3619
Permanent link to this record
 

 
Author Jialuo Chen; Mohamed Ali Souibgui; Alicia Fornes; Beata Megyesi
Title Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images Type Conference Article
Year 2021 Publication 4th International Conference on Historical Cryptology Abbreviated Journal
Volume Issue Pages 34-37
Keywords
Abstract Historical ciphers contain a wide range ofsymbols from various symbol sets. Iden-tifying the cipher alphabet is a prerequi-site before decryption can take place andis a time-consuming process. In this workwe explore the use of image processing foridentifying the underlying alphabet in ci-pher images, and to compare alphabets be-tween ciphers. The experiments show thatciphers with similar alphabets can be suc-cessfully discovered through clustering.
Address Virtual; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference HistoCrypt
Notes DAG; 602.230; 600.140; 600.121 Approved no
Call Number Admin @ si @ CSF2021 Serial (down) 3617
Permanent link to this record
 

 
Author Pau Torras; Arnau Baro; Lei Kang; Alicia Fornes
Title On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition Type Conference Article
Year 2021 Publication International Society for Music Information Retrieval Conference Abbreviated Journal
Volume Issue Pages 690-696
Keywords
Abstract Despite the latest advances in Deep Learning, the recognition of handwritten music scores is still a challenging endeavour. Even though the recent Sequence to Sequence(Seq2Seq) architectures have demonstrated its capacity to reliably recognise handwritten text, their performance is still far from satisfactory when applied to historical handwritten scores. Indeed, the ambiguous nature of handwriting, the non-standard musical notation employed by composers of the time and the decaying state of old paper make these scores remarkably difficult to read, sometimes even by trained humans. Thus, in this work we explore the incorporation of language models into a Seq2Seq-based architecture to try to improve transcriptions where the aforementioned unclear writing produces statistically unsound mistakes, which as far as we know, has never been attempted for this field of research on this architecture. After studying various Language Model integration techniques, the experimental evaluation on historical handwritten music scores shows a significant improvement over the state of the art, showing that this is a promising research direction for dealing with such difficult manuscripts.
Address Virtual; November 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ISMIR
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ TBK2021 Serial (down) 3616
Permanent link to this record