|
Records |
Links |
|
Author |
Andreas Fischer; Volkmar Frinken; Horst Bunke; Ching Y. Suen |
|
|
Title |
Improving HMM-Based Keyword Spotting with Character Language Models |
Type |
Conference Article |
|
Year |
2013 |
Publication |
12th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
506-510 |
|
|
Keywords |
|
|
|
Abstract |
Facing high error rates and slow recognition speed for full text transcription of unconstrained handwriting images, keyword spotting is a promising alternative to locate specific search terms within scanned document images. We have previously proposed a learning-based method for keyword spotting using character hidden Markov models that showed a high performance when compared with traditional template image matching. In the lexicon-free approach pursued, only the text appearance was taken into account for recognition. In this paper, we integrate character n-gram language models into the spotting system in order to provide an additional language context. On the modern IAM database as well as the historical George Washington database, we demonstrate that character language models significantly improve the spotting performance. |
|
|
Address |
Washington; USA; August 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1520-5363 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.045; 605.203 |
Approved |
no |
|
|
Call Number |
Admin @ si @ FFB2013 |
Serial |
2295 |
|
Permanent link to this record |
|
|
|
|
Author |
Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1381-1390 |
|
|
Keywords |
Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data |
|
|
Abstract |
Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online. |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.155; 302.105 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGK2022 |
Serial |
3662 |
|
Permanent link to this record |
|
|
|
|
Author |
Ruben Tito; Dimosthenis Karatzas; Ernest Valveny |
|
|
Title |
Hierarchical multimodal transformers for Multipage DocVQA |
Type |
Journal Article |
|
Year |
2023 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
144 |
Issue |
109834 |
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ TKV2023 |
Serial |
3836 |
|
Permanent link to this record |
|
|
|
|
Author |
Lluis Gomez; Y. Patel; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas |
|
|
Title |
Self‐supervised learning of visual features through embedding images into text topic spaces |
Type |
Conference Article |
|
Year |
2017 |
Publication |
30th IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches. |
|
|
Address |
Honolulu; Hawaii; July 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPR |
|
|
Notes |
DAG; 600.084; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GPR2017 |
Serial |
2889 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Alicia Fornes; Y.Kessentini; C.Tudor |
|
|
Title |
A Few-shot Learning Approach for Historical Encoded Manuscript Recognition |
Type |
Conference Article |
|
Year |
2021 |
Publication |
25th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
5413-5420 |
|
|
Keywords |
|
|
|
Abstract |
Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition. |
|
|
Address |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.121; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SFK2021 |
Serial |
3449 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Enric Marti; Jordi Regincos |
|
|
Title |
Interpretación de diseños a mano alzada como técnica de entrada a un sistema CAD en un ámbito de arquitectura |
Type |
Conference Article |
|
Year |
1993 |
Publication |
III National Conference on Computer Graphics |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
En los últimos años, se ha introducido ámpliamente el uso de los sistemas CAD en dominios relacionados con la arquitectura. Dichos sistemas CAD son muy útiles para el arquitecto en el diseño de planos de plantas de edificios. Sin embargo, la utilización eficiente de un CAD requiere un tiempo de aprendizaje, en especial, en la etapa de creación y edición del diseño. Además, una vez familiarizado con un CAD, el arquitecto debe adaptarse a la simbología que éste le permite que, en algunos casos puede ser poco flexible.Con esta motivación, se propone una técnica alternativa de entrada de documentos en sistemas CAD. Dicha técnica se basa en el diseño del plano sobre papel mediante un dibujo lineal hecho a mano alzada a modo de boceto e introducido mediante scanner. Una vez interpretado este dibujo inicial e introducido en el CAD, el arquitecto sólo deber hacer sobre éste los retoques finales del documento.El sistema de entrada propuesto se compone de dos módulos principales: En primer lugar, la extracción de características (puntos característicos, rectas y arcos) de la imagen obtenida mediante scanner. En dicho módulo se aplican principalmente técnicas de procesamiento de imágenes obteniendo como resultado una representaci¢n del dibujo de entrada basada en grafos de atributos. El objetivo del segundo módulo es el de encontrar y reconocer las entidades integrantes del documento (puertas, mesas, etc.) en base a una biblioteca de símbolos definida en el sistema CAD. La implementación de dicho módulo se basa en técnicas de isomorfismo de grafos.El sistema propone una alternativa que permita, mediante el diseño a mano alzada, la introducción de la informaci¢n m s significativa del plano de forma rápida, sencilla y estandarizada por parte del usuario. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
Granada |
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG;IAM; |
Approved |
no |
|
|
Call Number |
IAM @ iam @ LMR1993 |
Serial |
1571 |
|
Permanent link to this record |
|
|
|
|
Author |
Anguelos Nicolaou; Sounak Dey; V.Christlein; A.Maier; Dimosthenis Karatzas |
|
|
Title |
Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings |
Type |
Conference Article |
|
Year |
2018 |
Publication |
International Workshop on Reproducible Research in Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
11455 |
Issue |
|
Pages |
71-82 |
|
|
Keywords |
|
|
|
Abstract |
Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ NDC2018 |
Serial |
3178 |
|
Permanent link to this record |
|
|
|
|
Author |
P. Wang; V. Eglin; C. Garcia; C. Largeron; Josep Llados; Alicia Fornes |
|
|
Title |
A Novel Learning-free Word Spotting Approach Based on Graph Representation |
Type |
Conference Article |
|
Year |
2014 |
Publication |
11th IAPR International Workshop on Document Analysis and Systems |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
207-211 |
|
|
Keywords |
|
|
|
Abstract |
Effective information retrieval on handwritten document images has always been a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment result is introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods. |
|
|
Address |
Tours; France; April 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4799-3243-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG; 600.061; 602.006; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ WEG2014b |
Serial |
2517 |
|
Permanent link to this record |
|
|
|
|
Author |
P. Wang; V. Eglin; C. Garcia; C. Largeron; Josep Llados; Alicia Fornes |
|
|
Title |
A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance |
Type |
Conference Article |
|
Year |
2014 |
Publication |
22nd International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
3074 - 3079 |
|
|
Keywords |
word spotting; coarse-to-fine mechamism; graphbased representation; graph embedding; graph edit distance |
|
|
Abstract |
Effective information retrieval on handwritten document images has always been a challenging task, especially historical ones. In the paper, we propose a coarse-to-fine handwritten word spotting approach based on graph representation. The presented model comprises both the topological and morphological signatures of the handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. Aiming at developing a practical and efficient word spotting approach for large-scale historical handwritten documents, a fast and coarse comparison is first applied to prune the regions that are not similar to the query based on the graph embedding methodology. Afterwards, the query and regions of interest are compared by graph edit distance based on the Dynamic Time Warping alignment. The proposed approach is evaluated on a public dataset containing 50 pages of historical marriage license records. The results show that the proposed approach achieves a compromise between efficiency and accuracy. |
|
|
Address |
Stockholm; Sweden; August 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1051-4651 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.061; 602.006; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ WEG2014a |
Serial |
2515 |
|
Permanent link to this record |
|
|
|
|
Author |
P. Wang; V. Eglin; C. Garcia; C. Largeron; Josep Llados; Alicia Fornes |
|
|
Title |
Représentation par graphe de mots manuscrits dans les images pour la recherche par similarité |
Type |
Conference Article |
|
Year |
2014 |
Publication |
Colloque International Francophone sur l'Écrit et le Document |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
233-248 |
|
|
Keywords |
word spotting; graph-based representation; shape context description; graph edit distance; DTW; block merging; query by example |
|
|
Abstract |
Effective information retrieval on handwritten document images has always been
a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labeled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment results introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods. |
|
|
Address |
Nancy; Francia; March 2014 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CIFED |
|
|
Notes |
DAG; 600.061; 602.006; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ WEG2014c |
Serial |
2564 |
|
Permanent link to this record |