toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Souhail Bakkali; Sanket Biswas; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades; Josep Llados edit   pdf
url  openurl
  Title TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a ``closer-to-real'' industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ BBM2023 Serial 3995  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
doi  openurl
  Title Depth Map Estimation from a Single 2D Image Type Conference Article
  Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal  
  Volume Issue Pages 347-353  
  Keywords  
  Abstract This paper presents an innovative architecture based on a Cycle Generative Adversarial Network (CycleGAN) for the synthesis of high-quality depth maps from monocular images. The proposed architecture leverages a diverse set of loss functions, including cycle consistency, contrastive, identity, and least square losses, to facilitate the generation of depth maps that exhibit realism and high fidelity. A notable feature of the approach is its ability to synthesize depth maps from grayscale images without the need for paired training data. Extensive comparisons with different state-of-the-art methods show the superiority of the proposed approach in both quantitative metrics and visual quality. This work addresses the challenge of depth map synthesis and offers significant advancements in the field.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2023b Serial 4009  
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Henry Velesaca; Angel Sappa edit  url
doi  openurl
  Title Object Detection in Very Low-Resolution Thermal Images through a Guided-Based Super-Resolution Approach Type Conference Article
  Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This work proposes a novel approach that integrates super-resolution techniques with off-the-shelf object detection methods to tackle the problem of handling very low-resolution thermal images. The suggested approach begins by enhancing the low-resolution (LR) thermal images through a guided super-resolution strategy, leveraging a high-resolution (HR) visible spectrum image. Subsequently, object detection is performed on the high-resolution thermal image. The experimental results demonstrate tremendous improvements in comparison with both scenarios: when object detection is performed on the LR thermal image alone, as well as when object detection is conducted on the up-sampled LR thermal image. Moreover, the proposed approach proves highly valuable in camouflaged scenarios where objects might remain undetected in visible spectrum images.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ RVS2023 Serial 4010  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
doi  openurl
  Title Boosting Guided Super-Resolution Performance with Synthesized Images Type Conference Article
  Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal  
  Volume Issue Pages 189-195  
  Keywords  
  Abstract Guided image processing techniques are widely used for extracting information from a guiding image to aid in the processing of the guided one. These images may be sourced from different modalities, such as 2D and 3D, or different spectral bands, like visible and infrared. In the case of guided cross-spectral super-resolution, features from the two modal images are extracted and efficiently merged to migrate guidance information from one image, usually high-resolution (HR), toward the guided one, usually low-resolution (LR). Different approaches have been recently proposed focusing on the development of architectures for feature extraction and merging in the cross-spectral domains, but none of them care about the different nature of the given images. This paper focuses on the specific problem of guided thermal image super-resolution, where an LR thermal image is enhanced by an HR visible spectrum image. To improve existing guided super-resolution techniques, a novel scheme is proposed that maps the original guiding information to a thermal image-like representation that is similar to the output. Experimental results evaluating five different approaches demonstrate that the best results are achieved when the guiding and guided images share the same domain.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2023c Serial 4011  
Permanent link to this record
 

 
Author Ruben Tito; Khanh Nguyen; Marlon Tobaben; Raouf Kerkouche; Mohamed Ali Souibgui; Kangsoo Jung; Lei Kang; Ernest Valveny; Antti Honkela; Mario Fritz; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Privacy-Aware Document Visual Question Answering Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ PNT2023 Serial 4012  
Permanent link to this record
 

 
Author Daniel Marczak; Sebastian Cygert; Tomasz Trzcinski; Bartlomiej Twardowski edit  url
openurl 
  Title Revisiting Supervision for Continual Representation Learning Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, recent studies have highlighted the strengths of self-supervised continual representation learning. The improved transferability of representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes xxx Approved no  
  Call Number Admin @ si @ MCT2023 Serial 4013  
Permanent link to this record
 

 
Author Jose Luis Gomez; Manuel Silva; Antonio Seoane; Agnes Borras; Mario Noriega; German Ros; Jose Antonio Iglesias; Antonio Lopez edit   pdf
url  openurl
  Title All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we coin as the 'Three Musketeers'. We demonstrate the value of the Three Musketeers in unsupervised domain adaptation for image semantic segmentation. Results on real-world datasets, Cityscapes, Mapillary Vistas, and BDD100K, establish new benchmarks, largely attributed to UrbanSyn. We make UrbanSyn openly and freely accessible (this http URL).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ GSS2023 Serial 4015  
Permanent link to this record
 

 
Author Yael Tudela; Ana Garcia Rodriguez; Gloria Fernandez Esparrach; Jorge Bernal edit  url
doi  openurl
  Title Towards Fine-Grained Polyp Segmentation and Classification Type Conference Article
  Year 2023 Publication Workshop on Clinical Image-Based Procedures Abbreviated Journal  
  Volume 14242 Issue Pages 32-42  
  Keywords Medical image segmentation; Colorectal Cancer; Vision Transformer; Classification  
  Abstract Colorectal cancer is one of the main causes of cancer death worldwide. Colonoscopy is the gold standard screening tool as it allows lesion detection and removal during the same procedure. During the last decades, several efforts have been made to develop CAD systems to assist clinicians in lesion detection and classification. Regarding the latter, and in order to be used in the exploration room as part of resect and discard or leave-in-situ strategies, these systems must identify correctly all different lesion types. This is a challenging task, as the data used to train these systems presents great inter-class similarity, high class imbalance, and low representation of clinically relevant histology classes such as serrated sessile adenomas.

In this paper, a new polyp segmentation and classification method, Swin-Expand, is introduced. Based on Swin-Transformer, it uses a simple and lightweight decoder. The performance of this method has been assessed on a novel dataset, comprising 1126 high-definition images representing the three main histological classes. Results show a clear improvement in both segmentation and classification performance, also achieving competitive results when tested in public datasets. These results confirm that both the method and the data are important to obtain more accurate polyp representations.
 
  Address Vancouver; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference MICCAIW  
  Notes ISE Approved no  
  Call Number Admin @ si @ TGF2023 Serial 3837  
Permanent link to this record
 

 
Author Pau Torras; Mohamed Ali Souibgui; Sanket Biswas; Alicia Fornes edit  url
openurl 
  Title Segmentation-Free Alignment of Arbitrary Symbol Transcripts to Images Type Conference Article
  Year 2023 Publication Document Analysis and Recognition – ICDAR 2023 Workshops Abbreviated Journal  
  Volume 14193 Issue Pages 83-93  
  Keywords Historical Manuscripts; Symbol Alignment  
  Abstract Developing arbitrary symbol recognition systems is a challenging endeavour. Even using content-agnostic architectures such as few-shot models, performance can be substantially improved by providing a number of well-annotated examples into training. In some contexts, transcripts of the symbols are available without any position information associated to them, which enables using line-level recognition architectures. A way of providing this position information to detection-based architectures is finding systems that can align the input symbols with the transcription. In this paper we discuss some symbol alignment techniques that are suitable for low-data scenarios and provide an insight on their perceived strengths and weaknesses. In particular, we study the usage of Connectionist Temporal Classification models, Attention-Based Sequence to Sequence models and we compare them with the results obtained on a few-shot recognition system.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ TSS2023 Serial 3850  
Permanent link to this record
 

 
Author Mickael Coustaty; Alicia Fornes edit  url
openurl 
  Title Document Analysis and Recognition – ICDAR 2023 Workshops Type Book Whole
  Year 2023 Publication Document Analysis and Recognition – ICDAR 2023 Workshops Abbreviated Journal  
  Volume 14194 Issue 2 Pages  
  Keywords  
  Abstract  
  Address San Jose; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ CoF2023 Serial 3852  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
openurl 
  Title A Deep Learning Based Approach for Synthesizing Realistic Depth Maps Type Conference Article
  Year 2023 Publication 22nd International Conference on Image Analysis and Processing Abbreviated Journal  
  Volume 14234 Issue Pages 369–380  
  Keywords  
  Abstract This paper presents a novel cycle generative adversarial network (CycleGAN) architecture for synthesizing high-quality depth maps from a given monocular image. The proposed architecture uses multiple loss functions, including cycle consistency, contrastive, identity, and least square losses, to enable the generation of realistic and high-fidelity depth maps. The proposed approach addresses this challenge by synthesizing depth maps from RGB images without requiring paired training data. Comparisons with several state-of-the-art approaches are provided showing the proposed approach overcome other approaches both in terms of quantitative metrics and visual quality.  
  Address Udine; Italia; Setember 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICIAP  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2023a Serial 3968  
Permanent link to this record
 

 
Author Francesc Net; Marc Folia; Pep Casals; Lluis Gomez edit  url
openurl 
  Title Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14191 Issue Pages 3-17  
  Keywords Image deduplication; Near-duplicate images detection; Transductive Learning; Photographic Archives; Deep Learning  
  Abstract This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ NFC2023 Serial 3859  
Permanent link to this record
 

 
Author Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal edit  url
openurl 
  Title SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14187 Issue Pages 307–325  
  Keywords  
  Abstract Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed query selection for anchor initialization in the decoder. Later on, it performs a dot product between the obtained query embeddings and the pixel embedding map (coming from the encoder) for semantic reasoning. Extensive experimentation on competitive benchmarks like PubLayNet, PRIMA, Historical Japanese (HJ), and TableBank demonstrate that our model with SwinL backbone achieves better segmentation performance than the existing state-of-the-art approaches with the average precision of 93.72, 54.39, 84.65 and 98.04 respectively under one billion parameters. The code is made publicly available at: github.com/ayanban011/SwinDocSegmenter .  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ BBL2023 Serial 3893  
Permanent link to this record
 

 
Author Wenwen Yu; Chengquan Zhang; Haoyu Cao; Wei Hua; Bohan Li; Huang Chen; Mingyu Liu; Mingrui Chen; Jianfeng Kuang; Mengjun Cheng; Yuning Du; Shikun Feng; Xiaoguang Hu; Pengyuan Lyu; Kun Yao; Yuechen Yu; Yuliang Liu; Wanxiang Che; Errui Ding; Cheng-Lin Liu; Jiebo Luo; Shuicheng Yan; Min Zhang; Dimosthenis Karatzas; Xing Sun; Jingdong Wang; Xiang Bai edit  url
openurl 
  Title ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14188 Issue Pages 536–552  
  Keywords  
  Abstract Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot/Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ YZC2023 Serial 3896  
Permanent link to this record
 

 
Author Wenwen Yu; Mingyu Liu; Mingrui Chen; Ning Lu; Yinlong We; Yuliang Liu; Dimosthenis Karatzas; Xiang Bai edit  url
openurl 
  Title ICDAR 2023 Competition on Reading the Seal Title Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14188 Issue Pages 522–535  
  Keywords  
  Abstract Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants and received 135 submissions from academia and industry, including 28 participants and 72 submissions for Task 1, and 25 participants and 63 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (up) LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ YLC2023 Serial 3897  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: