Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–20] |
Records | |||||
---|---|---|---|---|---|
Author | A. Pujol; Jordi Vitria; Petia Radeva; Xavier Binefa; Robert Benavente; Ernest Valveny; Craig Von Land | ||||
Title | Real time pharmaceutical product recognition using color and shape indexing. | Type | Conference Article | ||
Year | 1999 | Publication | Proceedings of the 2nd International Workshop on European Scientific and Industrial Collaboration (WESIC´99), Promotoring Advanced Technologies in Manufacturing. | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Wales | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | OR;MILAB;DAG;CIC;MV | Approved | no | ||
Call Number | BCNPCL @ bcnpcl @ PVR1999 | Serial | 24 | ||
Permanent link to this record | |||||
Author | Alloy Das; Sanket Biswas; Ayan Banerjee; Josep Llados; Umapada Pal; Saumik Bhattacharya | ||||
Title | Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance | Type | Conference Article | ||
Year | 2024 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 718-728 | ||
Keywords | |||||
Abstract | The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency. | ||||
Address | Waikoloa; Hawai; USA; January 2024 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ DBB2024 | Serial | 3986 | ||
Permanent link to this record | |||||
Author | Alex Gomez-Villa; Bartlomiej Twardowski; Kai Wang; Joost van de Weijer | ||||
Title | Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning | Type | Conference Article | ||
Year | 2024 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1690-1700 | ||
Keywords | |||||
Abstract | Continuous unsupervised representation learning (CURL) research has greatly benefited from improvements in self-supervised learning (SSL) techniques. As a result, existing CURL methods using SSL can learn high-quality representations without any labels, but with a notable performance drop when learning on a many-tasks data stream. We hypothesize that this is caused by the regularization losses that are imposed to prevent forgetting, leading to a suboptimal plasticity-stability trade-off: they either do not adapt fully to the incoming data (low plasticity), or incur significant forgetting when allowed to fully adapt to a new SSL pretext-task (low stability). In this work, we propose to train an expert network that is relieved of the duty of keeping the previous knowledge and can focus on performing optimally on the new tasks (optimizing plasticity). In the second phase, we combine this new knowledge with the previous network in an adaptation-retrospection phase to avoid forgetting and initialize a new expert with the knowledge of the old network. We perform several experiments showing that our proposed approach outperforms other CURL exemplar-free methods in few- and many-task split settings. Furthermore, we show how to adapt our approach to semi-supervised continual learning (Semi-SCL) and show that we surpass the accuracy of other exemplar-free Semi-SCL methods and reach the results of some others that use exemplars. | ||||
Address | Waikoloa; Hawai; USA; January 2024 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ GTW2024 | Serial | 3989 | ||
Permanent link to this record | |||||
Author | Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol | ||||
Title | STEP – Towards Structured Scene-Text Spotting | Type | Conference Article | ||
Year | 2024 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 883-892 | ||
Keywords | |||||
Abstract | We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types of out-of-vocabulary structured text, reflecting important reading applications of fields such as prices, dates, serial numbers, license plates etc. We demonstrate that STEP can provide specialised OCR performance on demand in all tested scenarios. | ||||
Address | Waikoloa; Hawai; USA; January 2024 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ GKR2024 | Serial | 3992 | ||
Permanent link to this record | |||||
Author | Hunor Laczko; Meysam Madadi; Sergio Escalera; Jordi Gonzalez | ||||
Title | A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping | Type | Conference Article | ||
Year | 2024 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 8709-8718 | ||
Keywords | |||||
Abstract | RGB cloth generation has been deeply studied in the related literature, however, 3D garment generation remains an open problem. In this paper, we build a conditional variational autoencoder for 3D garment generation and draping. We propose a pyramid network to add garment details progressively in a canonical space, i.e. unposing and unshaping the garments w.r.t. the body. We study conditioning the network on surface normal UV maps, as an intermediate representation, which is an easier problem to optimize than 3D coordinates. Our results on two public datasets, CLOTH3D and CAPE, show that our model is robust, controllable in terms of detail generation by the use of multi-resolution pyramids, and achieves state-of-the-art results that can highly generalize to unseen garments, poses, and shapes even when training with small amounts of data. | ||||
Address | Waikoloa; Hawai; USA; January 2024 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | ISE; HUPBA | Approved | no | ||
Call Number | Admin @ si @ LME2024 | Serial | 3996 | ||
Permanent link to this record | |||||
Author | Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar | ||||
Title | Watching the News: Towards VideoQA Models that can Read | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than 8,600 QA pairs on 3,000+ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods. | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ JMK2023 | Serial | 3899 | ||
Permanent link to this record | |||||
Author | Marcos V Conde; Florin Vasluianu; Javier Vazquez; Radu Timofte | ||||
Title | Perceptual image enhancement for smartphone real-time applications | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1848-1858 | ||
Keywords | |||||
Abstract | Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones. | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | MACO; CIC | Approved | no | ||
Call Number | Admin @ si @ CVV2023 | Serial | 3900 | ||
Permanent link to this record | |||||
Author | Dipam Goswami; J Schuster; Joost Van de Weijer; Didier Stricker | ||||
Title | Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 3195-3204 | ||
Keywords | |||||
Abstract | Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation. D Goswami, R Schuster, J van de Weijer, D Stricker. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3195-3204 | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ GSW2023 | Serial | 3901 | ||
Permanent link to this record | |||||
Author | Mickael Cormier; Andreas Specker; Julio C. S. Jacques; Lucas Florin; Jurgen Metzler; Thomas B. Moeslund; Kamal Nasrollahi; Sergio Escalera; Jurgen Beyerer | ||||
Title | UPAR Challenge: Pedestrian Attribute Recognition and Attribute-based Person Retrieval – Dataset, Design, and Results | Type | Conference Article | ||
Year | 2023 | Publication | 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops | Abbreviated Journal | |
Volume | Issue | Pages | 166-175 | ||
Keywords | |||||
Abstract | In civilian video security monitoring, retrieving and tracking a person of interest often rely on witness testimony and their appearance description. Deployed systems rely on a large amount of annotated training data and are expected to show consistent performance in diverse areas and gen-eralize well between diverse settings w.r.t. different view-points, illumination, resolution, occlusions, and poses for indoor and outdoor scenes. However, for such generalization, the system would require a large amount of various an-notated data for training and evaluation. The WACV 2023 Pedestrian Attribute Recognition and Attributed-based Per-son Retrieval Challenge (UPAR-Challenge) aimed to spot-light the problem of domain gaps in a real-world surveil-lance context and highlight the challenges and limitations of existing methods. The UPAR dataset, composed of 40 important binary attributes over 12 attribute categories across four datasets, was extended with data captured from a low-flying UAV from the P-DESTRE dataset. To this aim, 0.6M additional annotations were manually labeled and vali-dated. Each track evaluated the robustness of the competing methods to domain shifts by training on limited data from a specific domain and evaluating using data from unseen do-mains. The challenge attracted 41 registered participants, but only one team managed to outperform the baseline on one track, emphasizing the task's difficulty. This work de-scribes the challenge design, the adopted dataset, obtained results, as well as future directions on the topic. | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACVW | ||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ CSJ2023 | Serial | 3902 | ||
Permanent link to this record | |||||
Author | Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar | ||||
Title | InfographicVQA | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1697-1706 | ||
Keywords | Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages | ||||
Abstract | Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org | ||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155 | Approved | no | ||
Call Number | MBT2022 | Serial | 3625 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1381-1390 | ||
Keywords | Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data | ||||
Abstract | Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online. |
||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155; 302.105 | Approved | no | ||
Call Number | Admin @ si @ BGK2022 | Serial | 3662 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1391-1400 | ||
Keywords | Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning | ||||
Abstract | The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin. | ||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155; 302.105; | Approved | no | ||
Call Number | Admin @ si @ BMG2022 | Serial | 3663 | ||
Permanent link to this record | |||||
Author | Javad Zolfaghari Bengar; Joost Van de Weijer; Laura Lopez-Fuentes; Bogdan Raducanu | ||||
Title | Class-Balanced Active Learning for Image Classification | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Active learning aims to reduce the labeling effort that is required to train algorithms by learning an acquisition function selecting the most relevant data for which a label should be requested from a large unlabeled data pool. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called long-tail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we proposed a general optimization framework that explicitly takes class-balancing into account. Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we showed that also on balanced datasets
our method 1 generally results in a performance gain. |
||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | LAMP; 602.200; 600.147; 600.120 | Approved | no | ||
Call Number | Admin @ si @ ZWL2022 | Serial | 3703 | ||
Permanent link to this record | |||||
Author | Jialuo Chen; Mohamed Ali Souibgui; Alicia Fornes; Beata Megyesi | ||||
Title | Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images | Type | Conference Article | ||
Year | 2021 | Publication | 4th International Conference on Historical Cryptology | Abbreviated Journal | |
Volume | Issue | Pages | 34-37 | ||
Keywords | |||||
Abstract | Historical ciphers contain a wide range ofsymbols from various symbol sets. Iden-tifying the cipher alphabet is a prerequi-site before decryption can take place andis a time-consuming process. In this workwe explore the use of image processing foridentifying the underlying alphabet in ci-pher images, and to compare alphabets be-tween ciphers. The experiments show thatciphers with similar alphabets can be suc-cessfully discovered through clustering. | ||||
Address | Virtual; September 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | HistoCrypt | ||
Notes | DAG; 602.230; 600.140; 600.121 | Approved | no | ||
Call Number | Admin @ si @ CSF2021 | Serial | 3617 | ||
Permanent link to this record | |||||
Author | Pau Torras; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes | ||||
Title | A Transcription Is All You Need: Learning to Align through Attention | Type | Conference Article | ||
Year | 2021 | Publication | 14th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | 12916 | Issue | Pages | 141–146 | |
Keywords | |||||
Abstract | Historical ciphered manuscripts are a type of document where graphical symbols are used to encrypt their content instead of regular text. Nowadays, expert transcriptions can be found in libraries alongside the corresponding manuscript images. However, those transcriptions are not aligned, so these are barely usable for training deep learning-based recognition methods. To solve this issue, we propose a method to align each symbol in the transcript of an image with its visual representation by using an attention-based Sequence to Sequence (Seq2Seq) model. The core idea is that, by learning to recognise symbols sequence within a cipher line image, the model also identifies their position implicitly through an attention mechanism. Thus, the resulting symbol segmentation can be later used for training algorithms. The experimental evaluation shows that this method is promising, especially taking into account the small size of the cipher dataset. | ||||
Address | Virtual; September 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | GREC | ||
Notes | DAG; 602.230; 600.140; 600.121 | Approved | no | ||
Call Number | Admin @ si @ TSC2021 | Serial | 3619 | ||
Permanent link to this record |