|
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez and Dimosthenis Karatzas. 2020. Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features. IEEE Winter Conference on Applications of Computer Vision.
Abstract: Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
|
|
|
Raul Gomez, Jaume Gibert, Lluis Gomez and Dimosthenis Karatzas. 2020. Exploring Hate Speech Detection in Multimodal Publications. IEEE Winter Conference on Applications of Computer Vision.
Abstract: In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.
|
|
|
Mohamed Ali Souibgui and 6 others. 2022. DocEnTr: An End-to-End Document Image Enhancement Transformer. 26th International Conference on Pattern Recognition.1699–1705.
Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Keywords: Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
|
|
|
Oriol Ramos Terrades, Albert Berenguel and Debora Gil. 2022. A Flexible Outlier Detector Based on a Topology Given by Graph Communities. BDR, 29, 100332.
Abstract: Outlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors.
This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings.
Keywords: Classification algorithms; Detection algorithms; Description of feature space local structure; Graph communities; Machine learning algorithms; Outlier detectors
|
|
|
Albert Gordo, Florent Perronnin and Ernest Valveny. 2012. Document classification using multiple views. 10th IAPR International Workshop on Document Analysis Systems. IEEE Computer Society Washington, 33–37.
Abstract: The combination of multiple features or views when representing documents or other kinds of objects usually leads to improved results in classification (and retrieval) tasks. Most systems assume that those views will be available both at training and test time. However, some views may be too `expensive' to be available at test time. In this paper, we consider the use of Canonical Correlation Analysis to leverage `expensive' views that are available only at training time. Experimental results show that this information may significantly improve the results in a classification task.
|
|
|
Josep Llados, J. Lopez-Krahe and Enric Marti. 1999. A Hough-based method for hatched pattern detection in maps and diagrams..
|
|
|
Philippe Dosch and Josep Llados. 2003. Vectorial Signatures for Symbol Discrimination.
|
|
|
Josep Llados and Gemma Sanchez. 2003. Symbol Recognition Using Graphs.
|
|
|
Gemma Sanchez and Josep Llados. 2003. Syntactic models to represent perceptually regular repetitive patterns in graphic documents.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez and Horst Bunke. 2009. On the use of textural features for writer identification in old handwritten music scores. 10th International Conference on Document Analysis and Recognition.996–1000.
Abstract: Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores which uses only music notation to determine the author. The steps of the proposed system are the following. First of all, the music sheet is preprocessed for obtaining a music score without the staff lines. Afterwards, four different methods for generating texture images from music symbols are applied. Every approach uses a different spatial variation when combining the music symbols to generate the textures. Finally, Gabor filters and Grey-scale Co-ocurrence matrices are used to obtain the features. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates.
|
|