toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Maciej Wielgosz; Antonio Lopez; Muhamad Naveed Riaz edit   pdf
url  openurl
  Title CARLA-BSP: a simulated dataset with pedestrians Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract We present a sample dataset featuring pedestrians generated using the ARCANE framework, a new framework for generating datasets in CARLA (0.9.13). We provide use cases for pedestrian detection, autoencoding, pose estimation, and pose lifting. We also showcase baseline results.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) ADAS Approved no  
  Call Number Admin @ si @ WLN2023 Serial 3866  
Permanent link to this record
 

 
Author Akhil Gurram; Antonio Lopez edit   pdf
url  openurl
  Title On the Metrics for Evaluating Monocular Depth Estimation Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Monocular Depth Estimation (MDE) is performed to produce 3D information that can be used in downstream tasks such as those related to on-board perception for Autonomous Vehicles (AVs) or driver assistance. Therefore, a relevant arising question is whether the standard metrics for MDE assessment are a good indicator of the accuracy of future MDE-based driving-related perception tasks. We address this question in this paper. In particular, we take the task of 3D object detection on point clouds as a proxy of on-board perception. We train and test state-of-the-art 3D object detectors using 3D point clouds coming from MDE models. We confront the ranking of object detection results with the ranking given by the depth estimation metrics of the MDE models. We conclude that, indeed, MDE evaluation metrics give rise to a ranking of methods that reflects relatively well the 3D object detection results we may expect. Among the different metrics, the absolute relative (abs-rel) error seems to be the best for that purpose.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) ADAS Approved no  
  Call Number Admin @ si @ GuL2023 Serial 3867  
Permanent link to this record
 

 
Author Yi Xiao; Felipe Codevilla; Diego Porres; Antonio Lopez edit  url
openurl 
  Title Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning Type Conference Article
  Year 2023 Publication International Conference on Intelligent Robots and Systems Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.  
  Address Detroit; USA; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IROS  
  Notes (up) ADAS Approved no  
  Call Number Admin @ si @ XCP2023 Serial 3930  
Permanent link to this record
 

 
Author Jose Luis Gomez edit  openurl
  Title Synth-to-real semi-supervised learning for visual tasks Type Book Whole
  Year 2023 Publication Going beyond Classification Problems for the Continual Learning of Deep Neural Networks Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The curse of data labeling is a costly bottleneck in supervised deep learning, where large amounts of labeled data are needed to train intelligent systems. In onboard perception for autonomous driving, this cost corresponds to the labeling of raw data from sensors such as cameras, LiDARs, RADARs, etc. Therefore, synthetic data with automatically generated ground truth (labels) has aroused as a reliable alternative for training onboard perception models.
However, synthetic data commonly suffers from synth-to-real domain shift, i.e., models trained on the synthetic domain do not show their achievable accuracy when performing in the real world. This shift needs to be addressed by techniques falling in the realm of domain adaptation (DA).
The semi-supervised learning (SSL) paradigm can be followed to address DA. In this case, a model is trained using source data with labels (here synthetic) and leverages minimal knowledge from target data (here the real world) to generate pseudo-labels. These pseudo-labels help the training process to reduce the gap between the source and the target domains. In general, we can assume accessing both, pseudo-labels and a few amounts of human-provided labels for the target-domain data. However, the most interesting and challenging setting consists in assuming that we do not have human-provided labels at all. This setting is known as unsupervised domain adaptation (UDA). This PhD focuses on applying SSL to the UDA setting, for onboard visual tasks related to autonomous driving. We start by addressing the synth-to-real UDA problem on onboard vision-based object detection (pedestrians and cars), a critical task for autonomous driving and driving assistance. In particular, we propose to apply an SSL technique known as co-training, which we adapt to work with deep models that process a multi-modal input. The multi-modality consists of the visual appearance of the images (RGB) and their monocular depth estimation. The synthetic data we use as the source domain contains both, object bounding boxes and depth information. This prior knowledge is the
starting point for the co-training technique, which iteratively labels unlabeled real-world data and uses such pseudolabels (here bounding boxes with an assigned object class) to progressively improve the labeling results. Along this
process, two models collaborate to automatically label the images, in a way that one model compensates for the errors of the other, so avoiding error drift. While this automatic labeling process is done offline, the resulting pseudolabels can be used to train object detection models that must perform in real-time onboard a vehicle. We show that multi-modal co-training improves the labeling results compared to single-modal co-training, remaining competitive compared to human labeling.
Given the success of co-training in the context of object detection, we have also adapted this technique to a more crucial and challenging visual task, namely, onboard semantic segmentation. In fact, providing labels for a single image
can take from 30 to 90 minutes for a human labeler, depending on the content of the image. Thus, developing automatic labeling techniques for this visual task is of great interest to the automotive industry. In particular, the new co-training framework addresses synth-to-real UDA by an initial stage of self-training. Intermediate models arising from this stage are used to start the co-training procedure, for which we have elaborated an accurate collaboration policy between the two models performing the automatic labeling. Moreover, our co-training seamlessly leverages datasets from different synthetic domains. In addition, the co-training procedure is agnostic to the loss function used to train the semantic segmentation models which perform the automatic labeling. We achieve state-of-the-art results on publicly available benchmark datasets, again, remaining competitive compared to human labeling.
Finally, on the ground of our previous experience, we have designed and implemented a new SSL technique for UDA in the context of visual semantic segmentation. In this case, we mimic the labeling methodology followed by human labelers. In particular, rather than labeling full images at a time, categories of semantic classes are defined and only those are labeled in a labeling pass. In fact, different human labelers can become specialists in labeling different categories. Afterward, these per-category-labeled layers are combined to provide fully labeled images. Our technique is inspired by this methodology since we perform synth-to-real UDA per category, using the self-training stage previously developed as part of our co-training framework. The pseudo-labels obtained for each category are finally
fused to obtain fully automatically labeled images. In this context, we have also contributed to the development of a new photo-realistic synthetic dataset based on path-tracing rendering. Our new SSL technique seamlessly leverages publicly available synthetic datasets as well as this new one to obtain state-of-the-art results on synth-to-real UDA for semantic segmentation. We show that the new dataset allows us to reach better labeling accuracy than previously existing datasets, at the same time that it complements well them when combined. Moreover, we also show that the new human-inspired SSL technique outperforms co-training.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Antonio Lopez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) ADAS Approved no  
  Call Number Admin @ si @ Gom2023 Serial 3961  
Permanent link to this record
 

 
Author Yi Xiao edit  isbn
openurl 
  Title Advancing Vision-based End-to-End Autonomous Driving Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving.
We propose to evaluate three approaches to tackle the challenge of end-to-end
autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s
ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular
depth estimation module, where human labeling is not needed. Then, based on
the intuition that the latent space of end-to-end driving models encodes relevant
information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning.
CIL++ leverages modern best practices, such as a large horizontal field of view and
a self-attention mechanism, which are contributing to the agent’s understanding of
the driving scene and bringing a better imitation of human drivers. Using training
data without any human labeling, our model yields almost expert performance in
the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Antonio Lopez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-4-6 Medium  
  Area Expedition Conference  
  Notes (up) ADAS Approved no  
  Call Number Admin @ si @ Xia2023 Serial 3964  
Permanent link to this record
 

 
Author Jose Luis Gomez; Manuel Silva; Antonio Seoane; Agnes Borras; Mario Noriega; German Ros; Jose Antonio Iglesias; Antonio Lopez edit   pdf
url  openurl
  Title All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we coin as the 'Three Musketeers'. We demonstrate the value of the Three Musketeers in unsupervised domain adaptation for image semantic segmentation. Results on real-world datasets, Cityscapes, Mapillary Vistas, and BDD100K, establish new benchmarks, largely attributed to UrbanSyn. We make UrbanSyn openly and freely accessible (this http URL).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) ADAS Approved no  
  Call Number Admin @ si @ GSS2023 Serial 4015  
Permanent link to this record
 

 
Author Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez edit  url
openurl 
  Title Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models Type Journal Article
  Year 2023 Publication Sensors – Special Issue on “Machine Learning for Autonomous Driving Perception and Prediction” Abbreviated Journal SENS  
  Volume 23 Issue 2 Pages 621  
  Keywords Domain adaptation; semi-supervised learning; Semantic segmentation; Autonomous driving  
  Abstract Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic
segmentation models. It consists of a self-training stage, which provides two domain-adapted models, and a model collaboration loop for the mutual improvement of these two models. These models are then used to provide the final semantic segmentation labels (pseudo-labels) for the real-world images. The overall
procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for on-board semantic segmentation. Our
procedure shows improvements ranging from ∼13 to ∼26 mIoU points over baselines, so establishing new state-of-the-art results.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) ADAS; no proj Approved no  
  Call Number Admin @ si @ GVL2023 Serial 3705  
Permanent link to this record
 

 
Author Danna Xue; Javier Vazquez; Luis Herranz; Yang Zhang; Michael S Brown edit  url
openurl 
  Title Integrating High-Level Features for Consistent Palette-based Multi-image Recoloring Type Journal Article
  Year 2023 Publication Computer Graphics Forum Abbreviated Journal CGF  
  Volume Issue Pages  
  Keywords  
  Abstract Achieving visually consistent colors across multiple images is important when images are used in photo albums, websites, and brochures. Unfortunately, only a handful of methods address multi-image color consistency compared to one-to-one color transfer techniques. Furthermore, existing methods do not incorporate high-level features that can assist graphic designers in their work. To address these limitations, we introduce a framework that builds upon a previous palette-based color consistency method and incorporates three high-level features: white balance, saliency, and color naming. We show how these features overcome the limitations of the prior multi-consistency workflow and showcase the user-friendly nature of our framework.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) CIC; MACO Approved no  
  Call Number Admin @ si @ XVH2023 Serial 3883  
Permanent link to this record
 

 
Author Danna Xue; Luis Herranz; Javier Vazquez; Yanning Zhang edit  url
doi  openurl
  Title Burst Perception-Distortion Tradeoff: Analysis and Evaluation Type Conference Article
  Year 2023 Publication IEEE International Conference on Acoustics, Speech and Signal Processing Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Burst image restoration attempts to effectively utilize the complementary cues appearing in sequential images to produce a high-quality image. Most current methods use all the available images to obtain the reconstructed image. However, using more images for burst restoration is not always the best option regarding reconstruction quality and efficiency, as the images acquired by handheld imaging devices suffer from degradation and misalignment caused by the camera noise and shake. In this paper, we extend the perception-distortion tradeoff theory by introducing multiple-frame information. We propose the area of the unattainable region as a new metric for perception-distortion tradeoff evaluation and comparison. Based on this metric, we analyse the performance of burst restoration from the perspective of the perception-distortion tradeoff under both aligned bursts and misaligned bursts situations. Our analysis reveals the importance of inter-frame alignment for burst restoration and shows that the optimal burst length for the restoration model depends both on the degree of degradation and misalignment.  
  Address Rodhes Islands; Greece; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICASSP  
  Notes (up) CIC; MACO Approved no  
  Call Number Admin @ si @ XHV2023 Serial 3909  
Permanent link to this record
 

 
Author Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados edit  url
openurl 
  Title Classification of aesthetic natural scene images using statistical and semantic features Type Journal Article
  Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume 82 Issue 9 Pages 13507-13532  
  Keywords  
  Abstract Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) DAG Approved no  
  Call Number Admin @ si @ BSP2023 Serial 3873  
Permanent link to this record
 

 
Author Asma Bensalah; Antonio Parziale; Giuseppe De Gregorio; Angelo Marcelli; Alicia Fornes; Josep Llados edit  url
doi  openurl
  Title I Can’t Believe It’s Not Better: In-air Movement for Alzheimer Handwriting Synthetic Generation Type Conference Article
  Year 2023 Publication 21st International Graphonomics Conference Abbreviated Journal  
  Volume Issue Pages 136–148  
  Keywords  
  Abstract During recent years, there here has been a boom in terms of deep learning use for handwriting analysis and recognition. One main application for handwriting analysis is early detection and diagnosis in the health field. Unfortunately, most real case problems still suffer a scarcity of data, which makes difficult the use of deep learning-based models. To alleviate this problem, some works resort to synthetic data generation. Lately, more works are directed towards guided data synthetic generation, a generation that uses the domain and data knowledge to generate realistic data that can be useful to train deep learning models. In this work, we combine the domain knowledge about the Alzheimer’s disease for handwriting and use it for a more guided data generation. Concretely, we have explored the use of in-air movements for synthetic data generation.  
  Address Evora; Portugal; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IGS  
  Notes (up) DAG Approved no  
  Call Number Admin @ si @ BPG2023 Serial 3838  
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
url  openurl
  Title Hierarchical multimodal transformers for Multipage DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue 109834 Pages  
  Keywords  
  Abstract Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (up) DAG Approved no  
  Call Number Admin @ si @ TKV2023 Serial 3836  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages  
  Keywords Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning  
  Abstract In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes (up) DAG Approved no  
  Call Number Admin @ si @ SBM2023 Serial 3848  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Pau Torras; Jialuo Chen; Alicia Fornes edit  url
openurl 
  Title An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts Type Conference Article
  Year 2023 Publication 7th International Workshop on Historical Document Imaging and Processing Abbreviated Journal  
  Volume Issue Pages 7-12  
  Keywords  
  Abstract This paper investigates the effectiveness of different deep learning HTR families, including LSTM, Seq2Seq, and transformer-based approaches with self-supervised pretraining, in recognizing ciphered manuscripts from different historical periods and cultures. The goal is to identify the most suitable method or training techniques for recognizing ciphered manuscripts and to provide insights into the challenges and opportunities in this field of research. We evaluate the performance of these models on several datasets of ciphered manuscripts and discuss their results. This study contributes to the development of more accurate and efficient methods for recognizing historical manuscripts for the preservation and dissemination of our cultural heritage.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference HIP  
  Notes (up) DAG Approved no  
  Call Number Admin @ si @ STC2023 Serial 3849  
Permanent link to this record
 

 
Author Pau Torras; Mohamed Ali Souibgui; Sanket Biswas; Alicia Fornes edit  url
openurl 
  Title Segmentation-Free Alignment of Arbitrary Symbol Transcripts to Images Type Conference Article
  Year 2023 Publication Document Analysis and Recognition – ICDAR 2023 Workshops Abbreviated Journal  
  Volume 14193 Issue Pages 83-93  
  Keywords Historical Manuscripts; Symbol Alignment  
  Abstract Developing arbitrary symbol recognition systems is a challenging endeavour. Even using content-agnostic architectures such as few-shot models, performance can be substantially improved by providing a number of well-annotated examples into training. In some contexts, transcripts of the symbols are available without any position information associated to them, which enables using line-level recognition architectures. A way of providing this position information to detection-based architectures is finding systems that can align the input symbols with the transcription. In this paper we discuss some symbol alignment techniques that are suitable for low-data scenarios and provide an insight on their perceived strengths and weaknesses. In particular, we study the usage of Connectionist Temporal Classification models, Attention-Based Sequence to Sequence models and we compare them with the results obtained on a few-shot recognition system.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes (up) DAG Approved no  
  Call Number Admin @ si @ TSS2023 Serial 3850  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: