|
Jean-Christophe Burie, J. Chazalon, M. Coustaty, S. Eskenazi, Muhammad Muzzamil Luqman, M. Mehri, et al. (2015). ICDAR2015 Competition on Smartphone Document Capture and OCR (SmartDoc). In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1161–1165).
Abstract: Smartphones are enabling new ways of capture,
hence arises the need for seamless and reliable acquisition and
digitization of documents, in order to convert them to editable,
searchable and a more human-readable format. Current stateof-the-art
works lack databases and baseline benchmarks for
digitizing mobile captured documents. We have organized a
competition for mobile document capture and OCR in order to
address this issue. The competition is structured into two independent
challenges: smartphone document capture, and smartphone
OCR. This report describes the datasets for both challenges
along with their ground truth, details the performance evaluation
protocols which we used, and presents the final results of the
participating methods. In total, we received 13 submissions: 8
for challenge-I, and 5 for challenge-2.
|
|
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo, & Josep Llados. (2015). Towards Query-by-Speech Handwritten Keyword Spotting. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 501–505).
Abstract: In this paper, we present a new querying paradigm for handwritten keyword spotting. We propose to represent handwritten word images both by visual and audio representations, enabling a query-by-speech keyword spotting system. The two representations are merged together and projected to a common sub-space in the training phase. This transform allows to, given a spoken query, retrieve word instances that were only represented by the visual modality. In addition, the same method can be used backwards at no additional cost to produce a handwritten text-tospeech system. We present our first results on this new querying mechanism using synthetic voices over the George Washington
dataset.
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, Josep Llados, R.Jain, & D.Doermann. (2015). Novel Line Verification for Multiple Instance Focused Retrieval in Document Collections. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 481–485).
|
|
|
Marçal Rusiñol, J. Chazalon, Jean-Marc Ogier, & Josep Llados. (2015). A Comparative Study of Local Detectors and Descriptors for Mobile Document Classification. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 596–600).
Abstract: In this paper we conduct a comparative study of local key-point detectors and local descriptors for the specific task of mobile document classification. A classification architecture based on direct matching of local descriptors is used as baseline for the comparative study. A set of four different key-point
detectors and four different local descriptors are tested in all the possible combinations. The experiments are conducted in a database consisting of 30 model documents acquired on 6 different backgrounds, totaling more than 36.000 test images.
|
|
|
J. Chazalon, Marçal Rusiñol, Jean-Marc Ogier, & Josep Llados. (2015). A Semi-Automatic Groundtruthing Tool for Mobile-Captured Document Segmentation. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 621–625).
Abstract: This paper presents a novel way to generate groundtruth data for the evaluation of mobile document capture systems, focusing on the first stage of the image processing pipeline involved: document object detection and segmentation in lowquality preview frames. We introduce and describe a simple, robust and fast technique based on color markers which enables a semi-automated annotation of page corners. We also detail a technique for marker removal. Methods and tools presented in the paper were successfully used to annotate, in few hours, 24889
frames in 150 video files for the smartDOC competition at ICDAR 2015
|
|
|
David Roche. (2015). A Statistical Framework for Terminating Evolutionary Algorithms at their Steady State (Debora Gil, & Jesus Giraldo, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: As any iterative technique, it is a necessary condition a stop criterion for terminating Evolutionary Algorithms (EA). In the case of optimization methods, the algorithm should stop at the time it has reached a steady state so it can not improve results anymore. Assessing the reliability of termination conditions for EAs is of prime importance. A wrong or weak stop criterion can negatively aect both the computational eort and the nal result.
In this Thesis, we introduce a statistical framework for assessing whether a termination condition is able to stop EA at its steady state. In one hand a numeric approximation to steady states to detect the point in which EA population has lost its diversity has been presented for EA termination. This approximation has been applied to dierent EA paradigms based on diversity and a selection of functions covering the properties most relevant for EA convergence. Experiments show that our condition works regardless of the search space dimension and function landscape and Dierential Evolution (DE) arises as the best paradigm. On the other hand, we use a regression model in order to determine the requirements ensuring that a measure derived from EA evolving population is related to the distance to the optimum in xspace.
Our theoretical framework is analyzed across several benchmark test functions
and two standard termination criteria based on function improvement in f-space and EA population x-space distribution for the DE paradigm. Results validate our statistical framework as a powerful tool for determining the capability of a measure for terminating EA and select the x-space distribution as the best-suited for accurately stopping DE in real-world applications.
|
|
|
Patricia Marquez. (2015). A Confidence Framework for the Assessment of Optical Flow Performance (Debora Gil, & Aura Hernandez, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Optical Flow (OF) is the input of a wide range of decision support systems such as car driver assistance, UAV guiding or medical diagnose. In these real situations, the absence of ground truth forces to assess OF quality using quantities computed from either sequences or the computed optical flow itself. These quantities are generally known as Confidence Measures, CM. Even if we have a proper confidence measure we still need a way to evaluate its ability to discard pixels with an OF prone to have a large error. Current approaches only provide a descriptive evaluation of the CM performance but such approaches are not capable to fairly compare different confidence measures and optical flow algorithms. Thus, it is of prime importance to define a framework and a general road map for the evaluation of optical flow performance.
This thesis provides a framework able to decide which pairs “ optical flow – confidence measure” (OF-CM) are best suited for optical flow error bounding given a confidence level determined by a decision support system. To design this framework we cover the following points:
Descriptive scores. As a first step, we summarize and analyze the sources of inaccuracies in the output of optical flow algorithms. Second, we present several descriptive plots that visually assess CM capabilities for OF error bounding. In addition to the descriptive plots, given a plot representing OF-CM capabilities to bound the error, we provide a numeric score that categorizes the plot according to its decreasing profile, that is, a score assessing CM performance.
Statistical framework. We provide a comparison framework that assesses the best suited OF-CM pair for error bounding that uses a two stage cascade process. First of all we assess the predictive value of the confidence measures by means of a descriptive plot. Then, for a sample of descriptive plots computed over training frames, we obtain a generic curve that will be used for sequences with no ground truth. As a second step, we evaluate the obtained general curve and its capabilities to really reflect the predictive value of a confidence measure using the variability across train frames by means of ANOVA.
The presented framework has shown its potential in the application on clinical decision support systems. In particular, we have analyzed the impact of the different image artifacts such as noise and decay to the output of optical flow in a cardiac diagnose system and we have improved the navigation inside the bronchial tree on bronchoscopy.
|
|
|
Marc Serra. (2015). Modeling, estimation and evaluation of intrinsic images considering color information (Robert Benavente, & Olivier Penacchio, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Image values are the result of a combination of visual information coming from multiple sources. Recovering information from the multiple factors thatproduced an image seems a hard and ill-posed problem. However, it is important to observe that humans develop the ability to interpret images and recognize and isolate specific physical properties of the scene.
Images describing a single physical characteristic of an scene are called intrinsic images. These images would benefit most computer vision tasks which are often affected by the multiple complex effects that are usually found in natural images (e.g. cast shadows, specularities, interreflections...).
In this thesis we analyze the problem of intrinsic image estimation from different perspectives, including the theoretical formulation of the problem, the visual cues that can be used to estimate the intrinsic components and the evaluation mechanisms of the problem.
|
|
|
Lluis Pere de las Heras, David Fernandez, Alicia Fornes, Ernest Valveny, Gemma Sanchez, & Josep Llados. (2013). Runlength Histogram Image Signature for Perceptual Retrieval of Architectural Floor Plans. In 10th IAPR International Workshop on Graphics Recognition.
|
|
|
Dimosthenis Karatzas, Lluis Gomez, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, et al. (2015). ICDAR 2015 Competition on Robust Reading. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1156–1160).
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2015). Object Proposals for Text Extraction in the Wild. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 206–210).
Abstract: Object Proposals is a recent computer vision technique receiving increasing interest from the research community. Its main objective is to generate a relatively small set of bounding box proposals that are most likely to contain objects of interest. The use of Object Proposals techniques in the scene text understanding field is innovative. Motivated by the success of powerful while expensive techniques to recognize words in a holistic way, Object Proposals techniques emerge as an alternative to the traditional text detectors. In this paper we study to what extent the existing generic Object Proposals methods may be useful for scene text understanding. Also, we propose a new Object Proposals algorithm that is specifically designed for text and compare it with other generic methods in the state of the art. Experiments show that our proposal is superior in its ability of producing good quality word proposals in an efficient way. The source code of our method is made publicly available
|
|
|
Anguelos Nicolaou, Andrew Bagdanov, Marcus Liwicki, & Dimosthenis Karatzas. (2015). Sparse Radial Sampling LBP for Writer Identification. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 716–720).
Abstract: In this paper we present the use of Sparse Radial Sampling Local Binary Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture classification. By adapting and extending the standard LBP operator to the particularities of text we get a generic text-as-texture classification scheme and apply it to writer identification. In experiments on CVL and ICDAR 2013 datasets, the proposed feature-set demonstrates State-Of-the-Art (SOA) performance. Among the SOA, the proposed method is the only one that is based on dense extraction of a single local feature descriptor. This makes it fast and applicable at the earliest stages in a DIA pipeline without the need for segmentation, binarization, or extraction of multiple features.
|
|
|
Suman Ghosh, Lluis Gomez, Dimosthenis Karatzas, & Ernest Valveny. (2015). Efficient indexing for Query By String text retrieval. In 6th IAPR International Workshop on Camera Based Document Analysis and Recognition CBDAR2015 (pp. 1236–1240).
Abstract: This paper deals with Query By String word spotting in scene images. A hierarchical text segmentation algorithm based on text specific selective search is used to find text regions. These regions are indexed per character n-grams present in the text region. An attribute representation based on Pyramidal Histogram of Characters (PHOC) is used to compare text regions with the query text. For generation of the index a similar attribute space based Pyramidal Histogram of character n-grams is used. These attribute models are learned using linear SVMs over the Fisher Vector [1] representation of the images along with the PHOC labels of the corresponding strings.
|
|
|
J.Kuhn, A.Nussbaumer, J.Pirker, Dimosthenis Karatzas, A. Pagani, O.Conlan, et al. (2015). Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space. In Workshop Proceedings on the 11th International Conference on Intelligent Environments (Vol. 19, pp. 373–380).
Abstract: Translating conceptual knowledge into real world experiences presents a significant educational challenge. This position paper presents an approach that supports learners in moving seamlessly between conceptual learning and their application in the real world by bringing physical and virtual experiments into everyday settings. Learners are empowered in conducting these situated experiments in a variety of physical settings by leveraging state of the art mobile, augmented reality, and virtual reality technology. A blend of mobile-based multi-sensory physical experiments, augmented reality and enabling virtual environments can allow learners to bridge their conceptual learning with tangible experiences in a completely novel manner. This approach focuses on the learner by applying self-regulated personalised learning techniques, underpinned by innovative pedagogical approaches and adaptation techniques, to ensure that the needs and preferences of each learner are catered for individually.
|
|
|
Lluis Pere de las Heras, Ernest Valveny, & Gemma Sanchez. (2013). Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies. In 10th IAPR International Workshop on Graphics Recognition.
|
|