|
Rahat Khan, Joost Van de Weijer, Dimosthenis Karatzas and Damien Muselet. 2013. Towards multispectral data acquisition with hand-held devices. 20th IEEE International Conference on Image Processing.2053–2057.
Abstract: We propose a method to acquire multispectral data with handheld devices with front-mounted RGB cameras. We propose to use the display of the device as an illuminant while the camera captures images illuminated by the red, green and
blue primaries of the display. Three illuminants and three response functions of the camera lead to nine response values which are used for reflectance estimation. Results are promising and show that the accuracy of the spectral reconstruction improves in the range from 30-40% over the spectral
reconstruction based on a single illuminant. Furthermore, we propose to compute sensor-illuminant aware linear basis by discarding the part of the reflectances that falls in the sensorilluminant null-space. We show experimentally that optimizing reflectance estimation on these new basis functions decreases
the RMSE significantly over basis functions that are independent to sensor-illuminant. We conclude that, multispectral data acquisition is potentially possible with consumer hand-held devices such as tablets, mobiles, and laptops, opening up applications which are currently considered to be unrealistic.
Keywords: Multispectral; mobile devices; color measurements
|
|
|
Marçal Rusiñol, J. Chazalon and Jean-Marc Ogier. 2014. Normalisation et validation d'images de documents capturées en mobilité. Colloque International Francophone sur l'Écrit et le Document.109–124.
Abstract: Mobile document image acquisition integrates many distortions which must be corrected or detected on the device, before the document becomes unavailable or paying data transmission fees. In this paper, we propose a system to correct perspective and illumination issues, and estimate the sharpness of the image for OCR recognition. The correction step relies on fast and accurate border detection followed by illumination normalization. Its evaluation on a private dataset shows a clear improvement on OCR accuracy. The quality assessment
step relies on a combination of focus measures. Its evaluation on a public dataset shows that this simple method compares well to state of the art, learning-based methods which cannot be embedded on a mobile, and outperforms metric-based methods.
Keywords: mobile document image acquisition; perspective correction; illumination correction; quality assessment; focus measure; OCR accuracy prediction
|
|
|
Miquel Ferrer, Ernest Valveny and F. Serratosa. 2009. Median Graphs: A Genetic Approach based on New Theoretical Properties. PR, 42(9), 2003–2012.
Abstract: Given a set of graphs, the median graph has been theoretically presented as a useful concept to infer a representative of the set. However, the computation of the median graph is a highly complex task and its practical application has been very limited up to now. In this work we present two major contributions. On one side, and from a theoretical point of view, we show new theoretical properties of the median graph. On the other side, using these new properties, we present a new approximate algorithm based on the genetic search, that improves the computation of the median graph. Finally, we perform a set of experiments on real data, where none of the existing algorithms for the median graph computation could be applied up to now due to their computational complexity. With these results, we show how the concept of the median graph can be used in real applications and leaves the box of the only-theoretical concepts, demonstrating, from a practical point of view, that can be a useful tool to represent a set of graphs.
Keywords: Median graph; Genetic search; Maximum common subgraph; Graph matching; Structural pattern recognition
|
|
|
Josep Llados, Dimosthenis Karatzas, Joan Mas and Gemma Sanchez. 2008. A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives.
Keywords: Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition
|
|
|
Miquel Ferrer, Dimosthenis Karatzas, Ernest Valveny, I. Bardaji and Horst Bunke. 2011. A Generic Framework for Median Graph Computation based on a Recursive Embedding Approach. CVIU, 115(7), 919–928.
Abstract: The median graph has been shown to be a good choice to obtain a represen- tative of a set of graphs. However, its computation is a complex problem. Recently, graph embedding into vector spaces has been proposed to obtain approximations of the median graph. The problem with such an approach is how to go from a point in the vector space back to a graph in the graph space. The main contribution of this paper is the generalization of this previ- ous method, proposing a generic recursive procedure that permits to recover the graph corresponding to a point in the vector space, introducing only the amount of approximation inherent to the use of graph matching algorithms. In order to evaluate the proposed method, we compare it with the set me- dian and with the other state-of-the-art embedding-based methods for the median graph computation. The experiments are carried out using four dif- ferent databases (one semi-artificial and three containing real-world data). Results show that with the proposed approach we can obtain better medi- ans, in terms of the sum of distances to the training graphs, than with the previous existing methods.
Keywords: Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition
|
|
|
Ali Furkan Biten, Lluis Gomez and Dimosthenis Karatzas. 2022. Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning. Winter Conference on Applications of Computer Vision.1381–1390.
Abstract: Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online.
Keywords: Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data
|
|
|
Ali Furkan Biten, Andres Mafla, Lluis Gomez and Dimosthenis Karatzas. 2022. Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching. Winter Conference on Applications of Computer Vision.1391–1400.
Abstract: The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin.
Keywords: Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning
|
|
|
Carlos Boned Riera and Oriol Ramos Terrades. 2022. Discriminative Neural Variational Model for Unbalanced Classification Tasks in Knowledge Graph. 26th International Conference on Pattern Recognition.2186–2191.
Abstract: Nowadays the paradigm of link discovery problems has shown significant improvements on Knowledge Graphs. However, method performances are harmed by the unbalanced nature of this classification problem, since many methods are easily biased to not find proper links. In this paper we present a discriminative neural variational auto-encoder model, called DNVAE from now on, in which we have introduced latent variables to serve as embedding vectors. As a result, the learnt generative model approximate better the underlying distribution and, at the same time, it better differentiate the type of relations in the knowledge graph. We have evaluated this approach on benchmark knowledge graph and Census records. Results in this last data set are quite impressive since we reach the highest possible score in the evaluation metrics. However, further experiments are still needed to deeper evaluate the performance of the method in more challenging tasks.
Keywords: Measurement; Couplings; Semantics; Ear; Benchmark testing; Data models; Pattern recognition
|
|
|
A.Kesidis and Dimosthenis Karatzas. 2014. Logo and Trademark Recognition. In D. Doermann and K. Tombre, eds. Handbook of Document Image Processing and Recognition. Springer London, 591–646.
Abstract: The importance of logos and trademarks in nowadays society is indisputable, variably seen under a positive light as a valuable service for consumers or a negative one as a catalyst of ever-increasing consumerism. This chapter discusses the technical approaches for enabling machines to work with logos, looking into the latest methodologies for logo detection, localization, representation, recognition, retrieval, and spotting in a variety of media. This analysis is presented in the context of three different applications covering the complete depth and breadth of state of the art techniques. These are trademark retrieval systems, logo recognition in document images, and logo detection and removal in images and videos. This chapter, due to the very nature of logos and trademarks, brings together various facets of document image analysis spanning graphical and textual content, while it links document image analysis to other computer vision domains, especially when it comes to the analysis of real-scene videos and images.
Keywords: Logo recognition; Logo removal; Logo spotting; Trademark registration; Trademark retrieval systems
|
|
|
Marçal Rusiñol, J. Chazalon and Jean-Marc Ogier. 2016. Filtrage de descripteurs locaux pour l'amélioration de la détection de documents. Colloque International Francophone sur l'Écrit et le Document.
Abstract: In this paper we propose an effective method aimed at reducing the amount of local descriptors to be indexed in a document matching framework.In an off-line training stage, the matching between the model document and incoming images is computed retaining the local descriptors from the model that steadily produce good matches. We have evaluated this approach by using the ICDAR2015 SmartDOC dataset containing near 25000 images from documents to be captured by a mobile device. We have tested the performance of this filtering step by using ORB and SIFT local detectors and descriptors. The results show an important gain both in quality of the final matching as well as in time and space requirements.
Keywords: Local descriptors; mobile capture; document matching; keypoint selection
|
|