|
Ernest Valveny and Antonio Lopez. 2003. Numeral Recognition for Quality Control of Surgical Sachets.
|
|
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo and Josep Llados. 2011. Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method. 11th International Conference on Document Analysis and Recognition.63–67.
Abstract: In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.
|
|
|
Adria Rico and Alicia Fornes. 2017. Camera-based Optical Music Recognition using a Convolutional Neural Network. 12th IAPR International Workshop on Graphics Recognition.27–28.
Abstract: Optical Music Recognition (OMR) consists in recognizing images of music scores. Contrary to expectation, the current OMR systems usually fail when recognizing images of scores captured by digital cameras and smartphones. In this work, we propose a camera-based OMR system based on Convolutional Neural Networks, showing promising preliminary results
Keywords: optical music recognition; document analysis; convolutional neural network; deep learning
|
|
|
Marçal Rusiñol, David Aldavert, Dimosthenis Karatzas, Ricardo Toledo and Josep Llados. 2011. Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content. Advances in Information Retrieval. In P. Clough and 6 others, eds. 33rd European Conference on Information Retrieval. Berlin, Springer, 314–325. (LNCS.)
Abstract: In this paper we propose an efficient queried-by-example retrieval system which is able to retrieve trademark images by similarity from patent and trademark offices' digital libraries. Logo images are described by both their semantic content, by means of the Vienna codes, and their visual contents, by using shape and color as visual cues. The trademark descriptors are then indexed by a locality-sensitive hashing data structure aiming to perform approximate k-NN search in high dimensional spaces in sub-linear time. The resulting ranked lists are combined by using the Condorcet method and a relevance feedback step helps to iteratively revise the query and refine the obtained results. The experiments demonstrate the effectiveness and efficiency of this system on a realistic and large dataset.
|
|
|
Ali Furkan Biten, Ruben Tito, Lluis Gomez, Ernest Valveny and Dimosthenis Karatzas. 2022. OCR-IDL: OCR Annotations for Industry Document Library Dataset. ECCV Workshop on Text in Everything.
Abstract: Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in this https URL.
|
|
|
Carlos David Martinez Hinarejos and 10 others. 2016. Context, multimodality, and user collaboration in handwritten text processing: the CoMUN-HaT project. 3rd IberSPEECH.
Abstract: Processing of handwritten documents is a task that is of wide interest for many
purposes, such as those related to preserve cultural heritage. Handwritten text recognition techniques have been successfully applied during the last decade to obtain transcriptions of handwritten documents, and keyword spotting techniques have been applied for searching specific terms in image collections of handwritten documents. However, results on transcription and indexing are far from perfect. In this framework, the use of new data sources arises as a new paradigm that will allow for a better transcription and indexing of handwritten documents. Three main different data sources could be considered: context of the document (style, writer, historical time, topics,. . . ), multimodal data (representations of the document in a different modality, such as the speech signal of the dictation of the text), and user feedback (corrections, amendments,. . . ). The CoMUN-HaT project aims at the integration of these different data sources into the transcription and indexing task for handwritten documents: the use of context derived from the analysis of the documents, how multimodality can aid the recognition process to obtain more accurate transcriptions (including transcription in a modern version of the language), and integration into a userin-the-loop assisted text transcription framework. This will be reflected in the construction of a transcription and indexing platform that can be used by both professional and nonprofessional users, contributing to crowd-sourcing activities to preserve cultural heritage and to obtain an accessible version of the involved corpus.
|
|
|
Fernando Vilariño, Dimosthenis Karatzas and Alberto Valcarce. 2018. The Library Living Lab Barcelona: A participative approach to technology as an enabling factor for innovation in cultural spaces.
|
|
|
Fernando Vilariño, Dimosthenis Karatzas and Alberto Valcarce. 2018. Libraries as New Innovation Hubs: The Library Living Lab. 30th ISPIM Innovation Conference.
Abstract: Libraries are in deep transformation both in EU and around the world, and they are thriving within a great window of opportunity for innovation. In this paper, we show how the Library Living Lab in Barcelona participated of this changing scenario and contributed to create the Bibliolab program, where more than 200 public libraries give voice to their users in a global user-centric innovation initiative, using technology as enabling factor. The Library Living Lab is a real 4-helix implementation where Universities, Research Centers, Public Administration, Companies and the Neighbors are joint together to explore how technology transforms the cultural experience of people. This case is an example of scalability and provides reference tools for policy making, sustainability, user engage methodologies and governance. We provide specific examples of new prototypes and services that help to understand how to redefine the role of the Library as a real hub for social innovation.
|
|
|
Alicia Fornes, Sergio Escalera, Josep Llados and Gemma Sanchez. 2007. Symbol Recognition by Multi-class Blurred Shape Models. Seventh IAPR International Workshop on Graphics Recognition.11–13.
|
|
|
Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas and Andrew Bagdanov. 2016. Improving Text Proposals for Scene Images with Fully Convolutional Networks. 23rd International Conference on Pattern Recognition Workshops.
Abstract: Text Proposals have emerged as a class-dependent version of object proposals – efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text
recognition. In this paper we propose an improvement over the original Text Proposals algorithm of [1], combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.
|
|