|   | 
Details
   web
Records
Author Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz; Shangling Jui
Title Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Type Conference Article
Year 2021 Publication Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (down) Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g. due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might no longer align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors, and propose a self regularization loss to decrease the negative impact of noisy neighbors. Furthermore, to aggregate information with more context, we consider expanded neighborhoods with small affinity values. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets. Code is available in https://github.com/Albert0147/SFDA_neighbors.
Address Online; December 7-10, 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference NIPS
Notes LAMP; 600.147; 600.141 Approved no
Call Number Admin @ si @ Serial 3691
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Y.Kessentini
Title DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement Type Journal Article
Year 2022 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 44 Issue 3 Pages 1180-1191
Keywords
Abstract (down) Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system. In this paper, we propose an effective end-to-end framework named Document Enhancement Generative Adversarial Networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images. To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017 and H-DIBCO 2018 datasets, proving its ability to restore a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the proposed model to be exploited in other document enhancement problems.
Address 1 March 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 602.230; 600.121; 600.140 Approved no
Call Number Admin @ si @ SoK2022 Serial 3454
Permanent link to this record
 

 
Author Carles Sanchez; Debora Gil; T. Gache; N. Koufos; Marta Diez-Ferrer; Antoni Rosell
Title SENSA: a System for Endoscopic Stenosis Assessment Type Conference Article
Year 2016 Publication 28th Conference of the international Society for Medical Innovation and Technology Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (down) Documenting the severity of a static or dynamic Central Airway Obstruction (CAO) is crucial to establish proper diagnosis and treatment, predict possible treatment effects and better follow-up the patients. The subjective visual evaluation of a stenosis during video-bronchoscopy still remains the most common way to assess a CAO in spite of a consensus among experts for a need to standardize all calculations [1].
The Computer Vision Center in cooperation with the «Hospital de Bellvitge», has developed a System for Endoscopic Stenosis Assessment (SENSA), which computes CAO directly by analyzing standard bronchoscopic data without the need of using other imaging tecnologies.
Address Rotterdam; The Netherlands; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference SMIT
Notes IAM; Approved no
Call Number Admin @ si @ SGG2016 Serial 2942
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal PR
Volume 144 Issue Pages 109834
Keywords
Abstract (down) Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.155; 600.121 Approved no
Call Number Admin @ si @ TKV2023 Serial 3825
Permanent link to this record
 

 
Author Ruben Tito; Khanh Nguyen; Marlon Tobaben; Raouf Kerkouche; Mohamed Ali Souibgui; Kangsoo Jung; Lei Kang; Ernest Valveny; Antti Honkela; Mario Fritz; Dimosthenis Karatzas
Title Privacy-Aware Document Visual Question Answering Type Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (down) Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ PNT2023 Serial 4012
Permanent link to this record
 

 
Author Subhajit Maity; Sanket Biswas; Siladittya Manna; Ayan Banerjee; Josep Llados; Saumik Bhattacharya; Umapada Pal
Title SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation Type Conference Article
Year 2023 Publication 17th International Conference on Doccument Analysis and Recognition Abbreviated Journal
Volume 14187 Issue Pages 342–360
Keywords
Abstract (down) Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: this https URL
Address Document Layout Analysis; Document
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ MBM2023 Serial 3990
Permanent link to this record
 

 
Author Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal
Title SemiDocSeg: Harnessing Semi-Supervised Learning for Document Layout Analysis Type Journal Article
Year 2024 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume Issue Pages
Keywords Document layout analysis; Semi-supervised learning; Co-Occurrence matrix; Instance segmentation; Swin transformer
Abstract (down) Document Layout Analysis (DLA) is the process of automatically identifying and categorizing the structural components (e.g. Text, Figure, Table, etc.) within a document to extract meaningful content and establish the page's layout structure. It is a crucial stage in document parsing, contributing to their comprehension. However, traditional DLA approaches often demand a significant volume of labeled training data, and the labor-intensive task of generating high-quality annotated training data poses a substantial challenge. In order to address this challenge, we proposed a semi-supervised setting that aims to perform learning on limited annotated categories by eliminating exhaustive and expensive mask annotations. The proposed setting is expected to be generalizable to novel categories as it learns the underlying positional information through a support set and class information through Co-Occurrence that can be generalized from annotated categories to novel categories. Here, we first extract features from the input image and support set with a shared multi-scale feature acquisition backbone. Then, the extracted feature representation is fed to the transformer encoder as a query. Later on, we utilize a semantic embedding network before the decoder to capture the underlying semantic relationships and similarities between different instances, enabling the model to make accurate predictions or classifications with only a limited amount of labeled data. Extensive experimentation on competitive benchmarks like PRIMA, DocLayNet, and Historical Japanese (HJ) demonstrate that this generalized setup obtains significant performance compared to the conventional supervised approach.
Address June 2024
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ BBL2024a Serial 4001
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal
Title DocEnTr: An End-to-End Document Image Enhancement Transformer Type Conference Article
Year 2022 Publication 26th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages 1699-1705
Keywords Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
Abstract (down) Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Address August 21-25, 2022 , Montréal Québec
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no
Call Number Admin @ si @ SBJ2022 Serial 3730
Permanent link to this record
 

 
Author Albert Gordo; Ernest Valveny
Title A rotation invariant page layout descriptor for document classification and retrieval Type Conference Article
Year 2009 Publication 10th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 481–485
Keywords
Abstract (down) Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
Address Barcelona, Spain
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1520-5363 ISBN 978-1-4244-4500-4 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number DAG @ dag @ GoV2009a Serial 1175
Permanent link to this record
 

 
Author Albert Gordo; Ernest Valveny
Title The diagonal split: A pre-segmentation step for page layout analysis & classification Type Conference Article
Year 2009 Publication 4th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal
Volume 5524 Issue Pages 290–297
Keywords
Abstract (down) Document classification is an important task in all the processes related to document storage and retrieval. In the case of complex documents, structural features are needed to achieve a correct classification. Unfortunately, physical layout analysis is error prone. In this paper we present a pre-segmentation step based on a divide & conquer strategy that can be used to improve the page segmentation results, independently of the segmentation algorithm used. This pre-segmentation step is evaluated in classification and retrieval using the selective CRLA algorithm for layout segmentation together with a clustering based on the voronoi area diagram, and tested on two different databases, MARG and Girona Archives.
Address Póvoa de Varzim, Portugal
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-02171-8 Medium
Area Expedition Conference IbPRIA
Notes DAG Approved no
Call Number DAG @ dag @ Gov2009b Serial 1176
Permanent link to this record
 

 
Author Christophe Rigaud; Clement Guerin; Dimosthenis Karatzas; Jean-Christophe Burie; Jean-Marc Ogier
Title Knowledge-driven understanding of images in comic books Type Journal Article
Year 2015 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 18 Issue 3 Pages 199-221
Keywords Document Understanding; comics analysis; expert system
Abstract (down) Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book’s and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.
Address
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1433-2833 ISBN Medium
Area Expedition Conference
Notes DAG; 600.056; 600.077 Approved no
Call Number RGK2015 Serial 2595
Permanent link to this record
 

 
Author Anjan Dutta; Pau Riba; Josep Llados; Alicia Fornes
Title Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification Type Conference Article
Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 33-38
Keywords graph embedding; hierarchical graph representation; graph clustering; stochastic graphlet embedding; graph classification
Abstract (down) Document pattern classification methods using graphs have received a lot of attention because of its robust representation paradigm and rich theoretical background. However, the way of preserving and the process for delineating documents with graphs introduce noise in the rendition of underlying data, which creates instability in the graph representation. To deal with such unreliability in representation, in this paper, we propose Pyramidal Stochastic Graphlet Embedding (PSGE).
Given a graph representing a document pattern, our method first computes a graph pyramid by successively reducing the base graph. Once the graph pyramid is computed, we apply Stochastic Graphlet Embedding (SGE) for each level of the pyramid and combine their embedded representation to obtain a global delineation of the original graph. The consideration of pyramid of graphs rather than just a base graph extends the representational power of the graph embedding, which reduces the instability caused due to noise and distortion. When plugged with support
vector machine, our proposed PSGE has outperformed the state-of-the-art results in recognition of handwritten words as well as graphical symbols
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.097; 601.302; 600.121 Approved no
Call Number Admin @ si @ DRL2017 Serial 3054
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Mohammed Jabreel; Hatem A. Rashwan; Syeda Furruka Banu; Petia Radeva; Domenec Puig
Title CuisineNet: Food Attributes Classification using Multi-scale Convolution Network Type Conference Article
Year 2018 Publication 21st International Conference of the Catalan Association for Artificial Intelligence Abbreviated Journal
Volume Issue Pages 365-372
Keywords
Abstract (down) Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.
Address Roses; catalonia; October 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CCIA
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ SJR2018 Serial 3113
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Mohammed Jabreel; Hatem A. Rashwan; Syeda Furruka Banu; Antonio Moreno; Petia Radeva; Domenec Puig
Title CuisineNet: Food Attributes Classification using Multi-scale Convolution Network. Type Miscellaneous
Year 2018 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (down) Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no proj Approved no
Call Number Admin @ si @ KJR2018 Serial 3235
Permanent link to this record
 

 
Author Antonio Carta; Andrea Cossu; Vincenzo Lomonaco; Davide Bacciu; Joost Van de Weijer
Title Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning Type Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (down) Distributed learning on the edge often comprises self-centered devices (SCD) which learn local tasks independently and are unwilling to contribute to the performance of other SDCs. How do we achieve forward transfer at zero cost for the single SCDs? We formalize this problem as a Distributed Continual Learning scenario, where SCD adapt to local tasks and a CL model consolidates the knowledge from the resulting stream of models without looking at the SCD's private data. Unfortunately, current CL methods are not directly applicable to this scenario. We propose Data-Agnostic Consolidation (DAC), a novel double knowledge distillation method that consolidates the stream of SC models without using the original data. DAC performs distillation in the latent space via a novel Projected Latent Distillation loss. Experimental results show that DAC enables forward transfer between SCDs and reaches state-of-the-art accuracy on Split CIFAR100, CORe50 and Split TinyImageNet, both in reharsal-free and distributed CL scenarios. Somewhat surprisingly, even a single out-of-distribution image is sufficient as the only source of data during consolidation.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ CCL2023 Serial 3871
Permanent link to this record