toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Mohamed Ali Souibgui; Y.Kessentini edit   pdf
url  doi
openurl 
  Title DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement Type Journal Article
  Year 2022 Publication (down) IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 44 Issue 3 Pages 1180-1191  
  Keywords  
  Abstract Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system. In this paper, we propose an effective end-to-end framework named Document Enhancement Generative Adversarial Networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images. To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017 and H-DIBCO 2018 datasets, proving its ability to restore a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the proposed model to be exploited in other document enhancement problems.  
  Address 1 March 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 602.230; 600.121; 600.140 Approved no  
  Call Number Admin @ si @ SoK2022 Serial 3454  
Permanent link to this record
 

 
Author Lei Kang; Pau Riba; Marcal Rusinol; Alicia Fornes; Mauricio Villegas edit  url
doi  openurl
  Title Content and Style Aware Generation of Text-line Images for Handwriting Recognition Type Journal Article
  Year 2021 Publication (down) IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume Issue Pages  
  Keywords  
  Abstract Handwritten Text Recognition has achieved an impressive performance in public benchmarks. However, due to the high inter- and intra-class variability between handwriting styles, such recognizers need to be trained using huge volumes of manually labeled training data. To alleviate this labor-consuming problem, synthetic data produced with TrueType fonts has been often used in the training loop to gain volume and augment the handwriting style variability. However, there is a significant style bias between synthetic and real data which hinders the improvement of recognition performance. To deal with such limitations, we propose a generative method for handwritten text-line images, which is conditioned on both visual appearance and textual content. Our method is able to produce long text-line samples with diverse handwriting styles. Once properly trained, our method can also be adapted to new target data by only accessing unlabeled text-line images to mimic handwritten styles and produce images with any textual content. Extensive experiments have been done on making use of the generated samples to boost Handwritten Text Recognition performance. Both qualitative and quantitative results demonstrate that the proposed approach outperforms the current state of the art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ KRR2021 Serial 3612  
Permanent link to this record
 

 
Author Jon Almazan; Albert Gordo; Alicia Fornes; Ernest Valveny edit  doi
openurl 
  Title Word Spotting and Recognition with Embedded Attributes Type Journal Article
  Year 2014 Publication (down) IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 36 Issue 12 Pages 2552 - 2566  
  Keywords  
  Abstract This article addresses the problems of word spotting and word recognition on images. In word spotting, the goal is to find all instances of a query word in a dataset of images. In recognition, the goal is to recognize the content of the word image, usually aided by a dictionary or lexicon. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. In this subspace, images and strings that represent the same word are close together, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem. Contrary to most other existing methods, our representation has a fixed length, is low dimensional, and is very fast to compute and, especially, to compare. We test our approach on four public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0162-8828 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.056; 600.045; 600.061; 602.006; 600.077 Approved no  
  Call Number Admin @ si @ AGF2014a Serial 2483  
Permanent link to this record
 

 
Author Anjan Dutta; Hichem Sahbi edit   pdf
doi  openurl
  Title Stochastic Graphlet Embedding Type Journal Article
  Year 2018 Publication (down) IEEE Transactions on Neural Networks and Learning Systems Abbreviated Journal TNNLS  
  Volume Issue Pages 1-14  
  Keywords Stochastic graphlets; Graph embedding; Graph classification; Graph hashing; Betweenness centrality  
  Abstract Graph-based methods are known to be successful in many machine learning and pattern classification tasks. These methods consider semi-structured data as graphs where nodes correspond to primitives (parts, interest points, segments,
etc.) and edges characterize the relationships between these primitives. However, these non-vectorial graph data cannot be straightforwardly plugged into off-the-shelf machine learning algorithms without a preliminary step of – explicit/implicit –graph vectorization and embedding. This embedding process
should be resilient to intra-class graph variations while being highly discriminant. In this paper, we propose a novel high-order stochastic graphlet embedding (SGE) that maps graphs into vector spaces. Our main contribution includes a new stochastic search procedure that efficiently parses a given graph and extracts/samples unlimitedly high-order graphlets. We consider
these graphlets, with increasing orders, to model local primitives as well as their increasingly complex interactions. In order to build our graph representation, we measure the distribution of these graphlets into a given graph, using particular hash functions that efficiently assign sampled graphlets into isomorphic sets with a very low probability of collision. When
combined with maximum margin classifiers, these graphlet-based representations have positive impact on the performance of pattern comparison and recognition as corroborated through extensive experiments using standard benchmark databases.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 602.167; 602.168; 600.097; 600.121 Approved no  
  Call Number Admin @ si @ DuS2018 Serial 3225  
Permanent link to this record
 

 
Author Yunchao Gong; Svetlana Lazebnik; Albert Gordo; Florent Perronnin edit   pdf
doi  isbn
openurl 
  Title Iterative quantization: A procrustean approach to learning binary codes for Large-Scale Image Retrieval Type Journal Article
  Year 2012 Publication (down) IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 35 Issue 12 Pages 2916-2929  
  Keywords  
  Abstract This paper addresses the problem of learning similarity-preserving binary codes for efficient similarity search in large-scale image collections. We formulate this problem in terms of finding a rotation of zero-centered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube, and propose a simple and efficient alternating minimization algorithm to accomplish this task. This algorithm, dubbed iterative quantization (ITQ), has connections to multi-class spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). The resulting binary codes significantly outperform several other state-of-the-art methods. We also show that further performance improvements can result from transforming the data with a nonlinear kernel mapping prior to PCA or CCA. Finally, we demonstrate an application of ITQ to learning binary attributes or “classemes” on the ImageNet dataset.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0162-8828 ISBN 978-1-4577-0394-2 Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ GLG 2012b Serial 2008  
Permanent link to this record
 

 
Author Alloy Das; Sanket Biswas; Umapada Pal; Josep Llados edit   pdf
url  openurl
  Title Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes Type Conference Article
  Year 2024 Publication (down) IEEE International Conference on Robotics and Automation in PACIFICO Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models will be released upon acceptance.  
  Address Yokohama; Japan; May 2024  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICRA  
  Notes DAG Approved no  
  Call Number Admin @ si @ DBP2024 Serial 3979  
Permanent link to this record
 

 
Author Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar edit   pdf
openurl 
  Title RoadText-1K: Text Detection and Recognition Dataset for Driving Videos Type Conference Article
  Year 2020 Publication (down) IEEE International Conference on Robotics and Automation Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/
projects/cvit-projects/roadtext-1k
 
  Address Paris; Francia; ???  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICRA  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ RMG2020 Serial 3400  
Permanent link to this record
 

 
Author Mohammed Al Rawi; Ernest Valveny edit   pdf
url  doi
openurl 
  Title Compact and Efficient Multitask Learning in Vision, Language and Speech Type Conference Article
  Year 2019 Publication (down) IEEE International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 2933-2942  
  Keywords  
  Abstract Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.  
  Address Seul; Korea; October 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ RaV2019 Serial 3365  
Permanent link to this record
 

 
Author Albert Gordo; Florent Perronnin edit  doi
isbn  openurl
  Title Asymmetric Distances for Binary Embeddings Type Conference Article
  Year 2011 Publication (down) IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 729 - 736  
  Keywords  
  Abstract In large-scale query-by-example retrieval, embedding image signatures in a binary space offers two benefits: data compression and search efficiency. While most embedding algorithms binarize both query and database signatures, it has been noted that this is not strictly a requirement. Indeed, asymmetric schemes which binarize the database signatures but not the query still enjoy the same two benefits but may provide superior accuracy. In this work, we propose two general asymmetric distances which are applicable to a wide variety of embedding techniques including Locality Sensitive Hashing (LSH), Locality Sensitive Binary Codes (LSBC), Spectral Hashing (SH) and Semi-Supervised Hashing (SSH). We experiment on four public benchmarks containing up to 1M images and show that the proposed asymmetric distances consistently lead to large improvements over the symmetric Hamming distance for all binary embedding techniques. We also propose a novel simple binary embedding technique – PCA Embedding (PCAE) – which is shown to yield competitive results with respect to more complex algorithms such as SH and SSH.  
  Address Providence, RI  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-4577-0394-2 Medium  
  Area Expedition Conference CVPR  
  Notes DAG Approved no  
  Call Number Admin @ si @ GoP2011; IAM @ iam @ GoP2011 Serial 1817  
Permanent link to this record
 

 
Author Sounak Dey; Pau Riba; Anjan Dutta; Josep Llados; Yi-Zhe Song edit   pdf
url  doi
openurl 
  Title Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval Type Conference Article
  Year 2019 Publication (down) IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 2179-2188  
  Keywords  
  Abstract In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research.  
  Address Long beach; CA; USA; June 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes DAG; 600.140; 600.121; 600.097 Approved no  
  Call Number Admin @ si @ DRD2019 Serial 3462  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: