toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Learning from# Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Workshops Abbreviated Journal  
  Volume 11134 Issue Pages 530-544  
  Keywords  
  Abstract (up) Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis.  
  Address Munich; Alemanya; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCVW  
  Notes DAG; 600.129; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ GGG2018b Serial 3176  
Permanent link to this record
 

 
Author Miquel Ferrer; Ernest Valveny; F. Serratosa edit  doi
openurl 
  Title Median graph: A new exact algorithm using a distance based on the maximum common subgraph Type Journal Article
  Year 2009 Publication Pattern Recognition Letters Abbreviated Journal PRL  
  Volume 30 Issue 5 Pages 579–588  
  Keywords  
  Abstract (up) Median graphs have been presented as a useful tool for capturing the essential information of a set of graphs. Nevertheless, computation of optimal solutions is a very hard problem. In this work we present a new and more efficient optimal algorithm for the median graph computation. With the use of a particular cost function that permits the definition of the graph edit distance in terms of the maximum common subgraph, and a prediction function in the backtracking algorithm, we reduce the size of the search space, avoiding the evaluation of a great amount of states and still obtaining the exact median. We present a set of experiments comparing our new algorithm against the previous existing exact algorithm using synthetic data. In addition, we present the first application of the exact median graph computation to real data and we compare the results against an approximate algorithm based on genetic search. These experimental results show that our algorithm outperforms the previous existing exact algorithm and in addition show the potential applicability of the exact solutions to real problems.  
  Address  
  Corporate Author Thesis  
  Publisher Elsevier Science Inc. Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0167-8655 ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number DAG @ dag @ FVS2009a Serial 1114  
Permanent link to this record
 

 
Author Marçal Rusiñol; J. Chazalon; Jean-Marc Ogier edit  openurl
  Title Normalisation et validation d'images de documents capturées en mobilité Type Conference Article
  Year 2014 Publication Colloque International Francophone sur l'Écrit et le Document Abbreviated Journal  
  Volume Issue Pages 109-124  
  Keywords mobile document image acquisition; perspective correction; illumination correction; quality assessment; focus measure; OCR accuracy prediction  
  Abstract (up) Mobile document image acquisition integrates many distortions which must be corrected or detected on the device, before the document becomes unavailable or paying data transmission fees. In this paper, we propose a system to correct perspective and illumination issues, and estimate the sharpness of the image for OCR recognition. The correction step relies on fast and accurate border detection followed by illumination normalization. Its evaluation on a private dataset shows a clear improvement on OCR accuracy. The quality assessment
step relies on a combination of focus measures. Its evaluation on a public dataset shows that this simple method compares well to state of the art, learning-based methods which cannot be embedded on a mobile, and outperforms metric-based methods.
 
  Address Nancy; France; March 2014  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CIFED  
  Notes DAG; 601.223; 600.077 Approved no  
  Call Number Admin @ si @ RCO2014b Serial 2546  
Permanent link to this record
 

 
Author Marçal Rusiñol; J. Chazalon; Jean-Marc Ogier edit  doi
isbn  openurl
  Title Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images Type Conference Article
  Year 2014 Publication 11th IAPR International Workshop on Document Analysis and Systems Abbreviated Journal  
  Volume Issue Pages 181 - 185  
  Keywords  
  Abstract (up) Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given.  
  Address Tours; France; April 2014  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-4799-3243-6 Medium  
  Area Expedition Conference DAS  
  Notes DAG; 601.223; 600.077 Approved no  
  Call Number Admin @ si @ RCO2014a Serial 2545  
Permanent link to this record
 

 
Author Emanuel Indermühle; Volkmar Frinken; Horst Bunke edit   pdf
doi  isbn
openurl 
  Title Mode Detection in Online Handwritten Documents using BLSTM Neural Networks Type Conference Article
  Year 2012 Publication 13th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages 302-307  
  Keywords  
  Abstract (up) Mode detection in online handwritten documents refers to the process of distinguishing different types of contents, such as text, formulas, diagrams, or tables, one from another. In this paper a new approach to mode detection is proposed that uses bidirectional long-short term memory (BLSTM) neural networks. The BLSTM neural network is a novel type of recursive neural network that has been successfully applied in speech and handwriting recognition. In this paper we show that it has the potential to significantly outperform traditional methods for mode detection, which are usually based on stroke classification. As a further advantage over previous approaches, the proposed system is trainable and does not rely on user-defined heuristics. Moreover, it can be easily adapted to new or additional types of modes by just providing the system with new training data.  
  Address Bari, italy  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-4673-2262-1 Medium  
  Area Expedition Conference ICFHR  
  Notes DAG Approved no  
  Call Number Admin @ si @ IFB2012 Serial 2056  
Permanent link to this record
 

 
Author M. Visani; Oriol Ramos Terrades; Salvatore Tabbone edit  doi
openurl 
  Title A Protocol to Characterize the Descriptive Power and the Complementarity of Shape Descriptors Type Journal Article
  Year 2011 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 14 Issue 1 Pages 87-100  
  Keywords Document analysis; Shape descriptors; Symbol description; Performance characterization; Complementarity analysis  
  Abstract (up) Most document analysis applications rely on the extraction of shape descriptors, which may be grouped into different categories, each category having its own advantages and drawbacks (O.R. Terrades et al. in Proceedings of ICDAR’07, pp. 227–231, 2007). In order to improve the richness of their description, many authors choose to combine multiple descriptors. Yet, most of the authors who propose a new descriptor content themselves with comparing its performance to the performance of a set of single state-of-the-art descriptors in a specific applicative context (e.g. symbol recognition, symbol spotting...). This results in a proliferation of the shape descriptors proposed in the literature. In this article, we propose an innovative protocol, the originality of which is to be as independent of the final application as possible and which relies on new quantitative and qualitative measures. We introduce two types of measures: while the measures of the first type are intended to characterize the descriptive power (in terms of uniqueness, distinctiveness and robustness towards noise) of a descriptor, the second type of measures characterizes the complementarity between multiple descriptors. Characterizing upstream the complementarity of shape descriptors is an alternative to the usual approach where the descriptors to be combined are selected by trial and error, considering the performance characteristics of the overall system. To illustrate the contribution of this protocol, we performed experimental studies using a set of descriptors and a set of symbols which are widely used by the community namely ART and SC descriptors and the GREC 2003 database.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; IF 1.091 Approved no  
  Call Number Admin @ si @VRT2011 Serial 1856  
Permanent link to this record
 

 
Author Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  openurl
  Title TextProposals: a Text‐specific Selective Search Algorithm for Word Spotting in the Wild Type Journal Article
  Year 2017 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 70 Issue Pages 60-74  
  Keywords  
  Abstract (up) Motivated by the success of powerful while expensive techniques to recognize words in a holistic way (Goel et al., 2013; Almazán et al., 2014; Jaderberg et al., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.

Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazán et al., 2014; Jaderberg et al., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.084; 601.197; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GoK2017 Serial 2886  
Permanent link to this record
 

 
Author Sergio Escalera; Alicia Fornes; Oriol Pujol; Petia Radeva edit  doi
isbn  openurl
  Title Multi-class Binary Symbol Classification with Circular Blurred Shape Models Type Conference Article
  Year 2009 Publication 15th International Conference on Image Analysis and Processing Abbreviated Journal  
  Volume 5716 Issue Pages 1005–1014  
  Keywords  
  Abstract (up) Multi-class binary symbol classification requires the use of rich descriptors and robust classifiers. Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps or noise. In this paper, we present the Circular Blurred Shape Model descriptor. This descriptor encodes the arrangement information of object parts in a correlogram structure. A prior blurring degree defines the level of distortion allowed to the symbol. Moreover, we learn the new feature space using a set of Adaboost classifiers, which are combined in the Error-Correcting Output Codes framework to deal with the multi-class categorization problem. The presented work has been validated over different multi-class data sets, and compared to the state-of-the-art descriptors, showing significant performance improvements.  
  Address Salerno, Italy  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN 0302-9743 ISBN 978-3-642-04145-7 Medium  
  Area Expedition Conference ICIAP  
  Notes MILAB;HuPBA;DAG Approved no  
  Call Number BCNPCL @ bcnpcl @ EFP2009c Serial 1186  
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades edit   pdf
doi  openurl
  Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 139 Issue Pages 109419  
  Keywords  
  Abstract (up) Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ BMC2023 Serial 3826  
Permanent link to this record
 

 
Author Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas edit   pdf
doi  openurl
  Title LSDE: Levenshtein Space Deep Embedding for Query-by-string Word Spotting Type Conference Article
  Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (up) n this paper we present the LSDE string representation and its application to handwritten word spotting. LSDE is a novel embedding approach for representing strings that learns a space in which distances between projected points are correlated with the Levenshtein edit distance between the original strings.
We show how such a representation produces a more semantically interpretable retrieval from the user’s perspective than other state of the art ones such as PHOC and DCToW. We also conduct a preliminary handwritten word spotting experiment on the George Washington dataset.
 
  Address Kyoto; Japan; November 2017  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.084; 600.121 Approved no  
  Call Number Admin @ si @ GRK2017 Serial 2999  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: