|
Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornes, Yousri Kessentini, et al. (2023). Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (Vol. 37).
Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR
Keywords: Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning
|
|
|
Antonio Clavelli, & Dimosthenis Karatzas. (2009). Text Segmentation in Colour Posters from the Spanish Civil War Era. In 10th International Conference on Document Analysis and Recognition (pp. 181–185).
Abstract: The extraction of textual content from colour documents of a graphical nature is a complicated task. The text can be rendered in any colour, size and orientation while the existence of complex background graphics with repetitive patterns can make its localization and segmentation extremely difficult.
Here, we propose a new method for extracting textual content from such colour images that makes no assumption as to the size of the characters, their orientation or colour, while it is tolerant to characters that do not follow a straight baseline. We evaluate this method on a collection of documents with historical
connotations: the Posters from the Spanish Civil War.
|
|
|
Klara Janousckova, Jiri Matas, Lluis Gomez, & Dimosthenis Karatzas. (2020). Text Recognition – Real World Data and Where to Find Them. In 25th International Conference on Pattern Recognition (pp. 4489–4496).
Abstract: We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as “pseudo ground truth” (PGT). The method is applied to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7% on average, across different benchmark datasets (image domains) and 24.5% on one of the weakly annotated datasets 1 1 Acknowledgements. The authors were supported by Czech Technical University student grant SGS20/171/0HK3/3TJ13, the MEYS VVV project CZ.02.1.01/0.010.0J16 019/0000765 Research Center for Informatics, the Spanish Research project TIN2017-89779-P and the CERCA Programme / Generalitat de Catalunya.
|
|
|
Partha Pratim Roy, Umapada Pal, & Josep Llados. (2012). Text line extraction in graphical documents using background and foreground. IJDAR - International Journal on Document Analysis and Recognition, 15(3), 227–241.
Abstract: 0,405 JCR
In graphical documents (e.g., maps, engineering drawings), artistic documents etc., the text lines are annotated in multiple orientations or curvilinear way to illustrate different locations or symbols. For the optical character recognition of such documents, individual text lines from the documents need to be extracted. In this paper, we propose a novel method to segment such text lines and the method is based on the foreground and background information of the text components. To effectively utilize the background information, a water reservoir concept is used here. In the proposed scheme, at first, individual components are detected and grouped into character clusters in a hierarchical way using size and positional information. Next, the clusters are extended in two extreme sides to determine potential candidate regions. Finally, with the help of these candidate regions,
individual lines are extracted. The experimental results are presented on different datasets of graphical documents, camera-based warped documents, noisy images containing seals, etc. The results demonstrate that our approach is robust and invariant to size and orientation of the text lines present in
the document.
|
|
|
Sergio Escalera, Xavier Baro, Jordi Vitria, & Petia Radeva. (2009). Text Detection in Urban Scenes (video sample). In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, 35–44).
Abstract: Abstract. Text detection in urban scenes is a hard task due to the high variability of text appearance: different text fonts, changes in the point of view, or partial occlusion are just a few problems. Text detection can be specially suited for georeferencing business, navigation, tourist assistance, or to help visual impaired people. In this paper, we propose a general methodology to deal with the problem of text detection in outdoor scenes. The method is based on learning spatial information of gradient based features and Census Transform images using a cascade of classifiers. The method is applied in the context of Mobile Mapping systems, where a mobile vehicle captures urban image sequences. Moreover, a cover data set is presented and tested with the new methodology. The results show high accuracy when detecting multi-linear text regions with high variability of appearance, at same time that it preserves a low false alarm rate compared to classical approaches
|
|
|
Muhammad Anwer Rao, Fahad Shahbaz Khan, Joost Van de Weijer, & Jorma Laaksonen. (2017). Tex-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition. In 19th International Conference on Multimodal Interaction.
Abstract: Recognizing materials and textures in realistic imaging conditions is a challenging computer vision problem. For many years, local features based orderless representations were a dominant approach for texture recognition. Recently deep local features, extracted from the intermediate layers of a Convolutional Neural Network (CNN), are used as filter banks. These dense local descriptors from a deep model, when encoded with Fisher Vectors, have shown to provide excellent results for texture recognition. The CNN models, employed in such approaches, take RGB patches as input and train on a large amount of labeled images. We show that CNN models, which we call TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard deep models trained on RGB patches. We further investigate two deep architectures, namely early and late fusion, to combine the texture and color information. Experiments on benchmark texture datasets clearly demonstrate that TEX-Nets provide complementary information to standard RGB deep network. Our approach provides a large gain of 4.8%, 3.5%, 2.6% and 4.1% respectively in accuracy on the DTD, KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets, compared to the standard RGB network of the same architecture. Further, our final combination leads to consistent improvements over the state-of-the-art on all four datasets.
Keywords: Convolutional Neural Networks; Texture Recognition; Local Binary Paterns
|
|
|
Hans Stadthagen-Gonzalez, M. Carmen Parafita, C. Alejandro Parraga, & Markus F. Damian. (2019). Testing alternative theoretical accounts of code-switching: Insights from comparative judgments of adjective noun order. IJB - International journal of bilingualism: interdisciplinary studies of multilingual behaviour, 23(1), 200–220.
Abstract: Objectives:
Spanish and English contrast in adjective–noun word order: for example, brown dress (English) vs. vestido marrón (‘dress brown’, Spanish). According to the Matrix Language model (MLF) word order in code-switched sentences must be compatible with the word order of the matrix language, but working within the minimalist program (MP), Cantone and MacSwan arrived at the descriptive generalization that the position of the noun phrase relative to the adjective is determined by the adjective’s language. Our aim is to evaluate the predictions derived from these two models regarding adjective–noun order in Spanish–English code-switched sentences.
Methodology:
We contrasted the predictions from both models regarding the acceptability of code-switched sentences with different adjective–noun orders that were compatible with the MP, the MLF, both, or none. Acceptability was assessed in Experiment 1 with a 5-point Likert and in Experiment 2 with a 2-Alternative Forced Choice (2AFC) task.
|
|
|
Marc Masana, Tinne Tuytelaars, & Joost Van de Weijer. (2021). Ternary Feature Masks: zero-forgetting for task-incremental learning. In 34th IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 3565–3574).
Abstract: We propose an approach without any forgetting to continual learning for the task-aware regime, where at inference the task-label is known. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Using masks prevents both catastrophic forgetting and backward transfer. We argue -- and show experimentally -- that avoiding the former largely compensates for the lack of the latter, which is rarely observed in practice. In contrast to earlier works, our masks are applied to the features (activations) of each layer instead of the weights. This considerably reduces the number of mask parameters for each new task; with more than three orders of magnitude for most networks. The encoding of the ternary masks into two bits per feature creates very little overhead to the network, avoiding scalability issues. To allow already learned features to adapt to the current task without changing the behavior of these features for previous tasks, we introduce task-specific feature normalization. Extensive experiments on several finegrained datasets and ImageNet show that our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
|
|
|
Debora Gil, David Roche, Agnes Borras, & Jesus Giraldo. (2015). Terminating Evolutionary Algorithms at their Steady State. COA - Computational Optimization and Applications, 61(2), 489–515.
Abstract: Assessing the reliability of termination conditions for evolutionary algorithms (EAs) is of prime importance. An erroneous or weak stop criterion can negatively affect both the computational effort and the final result. We introduce a statistical framework for assessing whether a termination condition is able to stop an EA at its steady state, so that its results can not be improved anymore. We use a regression model in order to determine the requirements ensuring that a measure derived from EA evolving population is related to the distance to the optimum in decision variable space. Our framework is analyzed across 24 benchmark test functions and two standard termination criteria based on function fitness value in objective function space and EA population decision variable space distribution for the differential evolution (DE) paradigm. Results validate our framework as a powerful tool for determining the capability of a measure for terminating EA and the results also identify the decision variable space distribution as the best-suited for accurately terminating DE in real-world applications.
Keywords: Evolutionary algorithms; Termination condition; Steady state; Differential evolution
|
|
|
Neelu Madan, Arya Farkhondeh, Kamal Nasrollahi, Sergio Escalera, & Thomas B. Moeslund. (2021). Temporal Cues From Socially Unacceptable Trajectories for Anomaly Detection. In IEEE/CVF International Conference on Computer Vision Workshops (pp. 2150–2158).
Abstract: State-of-the-Art (SoTA) deep learning-based approaches to detect anomalies in surveillance videos utilize limited temporal information, including basic information from motion, e.g., optical flow computed between consecutive frames. In this paper, we compliment the SoTA methods by including long-range dependencies from trajectories for anomaly detection. To achieve that, we first created trajectories by running a tracker on two SoTA datasets, namely Avenue and Shanghai-Tech. We propose a prediction-based anomaly detection method using trajectories based on Social GANs, also called in this paper as temporal-based anomaly detection. Then, we hypothesize that late fusion of the result of this temporal-based anomaly detection system with spatial-based anomaly detection systems produces SoTA results. We verify this hypothesis on two spatial-based anomaly detection systems. We show that both cases produce results better than baseline spatial-based systems, indicating the usefulness of the temporal information coming from the trajectories for anomaly detection. We observe that the proposed approach depicts the maximum improvement in micro-level Area-Under-the-Curve (AUC) by 4.1% on CUHK Avenue and 3.4% on Shanghai-Tech over one of the baseline method. We also show a high performance on cross-data evaluation, where we learn the weights to combine spatial and temporal information on Shanghai-Tech and perform evaluation on CUHK Avenue and vice-versa.
|
|
|
Javad Zolfaghari Bengar, Abel Gonzalez-Garcia, Gabriel Villalonga, Bogdan Raducanu, Hamed H. Aghdam, Mikhail Mozerov, et al. (2019). Temporal Coherence for Active Learning in Videos. In IEEE International Conference on Computer Vision Workshops (pp. 914–923).
Abstract: Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets.
|
|
|
Antonio Lopez, J. Hilgenstock, A. Busse, Ramon Baldrich, Felipe Lumbreras, & Joan Serrat. (2008). Temporal Coherence Analysis for Intelligent Headlight Control.
Keywords: Intelligent Headlights
|
|
|
A. Pujol, Felipe Lumbreras, Javier Varona, & Juan J. Villanueva. (1999). Template matching through invariant eigenspace projection..
|
|
|
Craig Von Land, Ricardo Toledo, & Juan J. Villanueva. (1997). TeleRegions: Application of Telematics in Cardiac Care..
|
|
|
Craig Von Land, Ricardo Toledo, & Juan J. Villanueva. (1996). TeleRegion: Tele-Applications for European Regions.
|
|