|
Albert Berenguel, Oriol Ramos Terrades, Josep Llados, & Cristina Cañero. (2019). Recurrent Comparator with attention models to detect counterfeit documents. In 15th International Conference on Document Analysis and Recognition.
Abstract: This paper is focused on the detection of counterfeit documents via the recurrent comparison of the security textured background regions of two images. The main contributions are twofold: first we apply and adapt a recurrent comparator architecture with attention mechanism to the counterfeit detection task, which constructs a representation of the background regions by recurrently condition the next observation, learning the difference between genuine and counterfeit images through iterative glimpses. Second we propose a new counterfeit document dataset to ensure the generalization of the learned model towards the detection of the lack of resolution during the counterfeit manufacturing. The presented network, outperforms state-of-the-art classification approaches for counterfeit detection as demonstrated in the evaluation.
|
|
|
Albert Clapes, Julio C. S. Jacques Junior, Carla Morral, & Sergio Escalera. (2020). ChaLearn LAP 2020 Challenge on Identity-preserved Human Detection: Dataset and Results. In 15th IEEE International Conference on Automatic Face and Gesture Recognition (pp. 801–808).
Abstract: This paper summarizes the ChaLearn Looking at People 2020 Challenge on Identity-preserved Human Detection (IPHD). For the purpose, we released a large novel dataset containing more than 112K pairs of spatiotemporally aligned depth and thermal frames (and 175K instances of humans) sampled from 780 sequences. The sequences contain hundreds of non-identifiable people appearing in a mix of in-the-wild and scripted scenarios recorded in public and private places. The competition was divided into three tracks depending on the modalities exploited for the detection: (1) depth, (2) thermal, and (3) depth-thermal fusion. Color was also captured but only used to facilitate the groundtruth annotation. Still the temporal synchronization of three sensory devices is challenging, so bad temporal matches across modalities can occur. Hence, the labels provided should considered “weak”, although test frames were carefully selected to minimize this effect and ensure the fairest comparison of the participants’ results. Despite this added difficulty, the results got by the participants demonstrate current fully-supervised methods can deal with that and achieve outstanding detection performance when measured in terms of AP@0.50.
|
|
|
Albert Clapes, Miguel Reyes, & Sergio Escalera. (2012). User Identification and Object Recognition in Clutter Scenes Based on RGB-Depth Analysis. In 7th Conference on Articulated Motion and Deformable Objects (Vol. 7378, pp. 1–11). LNCS. Springer Berlin Heidelberg.
Abstract: We propose an automatic system for user identification and object recognition based on multi-modal RGB-Depth data analysis. We model a RGBD environment learning a pixel-based background Gaussian distribution. Then, user and object candidate regions are detected and recognized online using robust statistical approaches over RGBD descriptions. Finally, the system saves the historic of user-object assignments, being specially useful for surveillance scenarios. The system has been evaluated on a novel data set containing different indoor/outdoor scenarios, objects, and users, showing accurate recognition and better performance than standard state-of-the-art approaches.
|
|
|
Albert Clapes, Ozan Bilici, Dariia Temirova, Egils Avots, Gholamreza Anbarjafari, & Sergio Escalera. (2018). From apparent to real age: gender, age, ethnic, makeup, and expression bias analysis in real age estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 2373–2382).
|
|
|
Albert Clapes, Tinne Tuytelaars, & Sergio Escalera. (2017). Darwintrees for action recognition. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
|
|
|
Albert Gordo, Alicia Fornes, Ernest Valveny, & Josep Llados. (2010). A Bag of Notes Approach to Writer Identification in Old Handwritten Music Scores. In 9th IAPR International Workshop on Document Analysis Systems (247–254).
Abstract: Determining the authorship of a document, namely writer identification, can be an important source of information for document categorization. Contrary to text documents, the identification of the writer of graphical documents is still a challenge. In this paper we present a robust approach for writer identification in a particular kind of graphical documents, old music scores. This approach adapts the bag of visual terms method for coping with graphic documents. The identification is performed only using the graphical music notation. For this purpose, we generate a graphic vocabulary without recognizing any music symbols, and consequently, avoiding the difficulties in the recognition of hand-drawn symbols in old and degraded documents. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving very high identification rates.
|
|
|
Albert Gordo, & Ernest Valveny. (2009). A rotation invariant page layout descriptor for document classification and retrieval. In 10th International Conference on Document Analysis and Recognition (481–485).
Abstract: Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
|
|
|
Albert Gordo, & Ernest Valveny. (2009). The diagonal split: A pre-segmentation step for page layout analysis & classification. In 4th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 5524, 290–297). LNCS. Springer Berlin Heidelberg.
Abstract: Document classification is an important task in all the processes related to document storage and retrieval. In the case of complex documents, structural features are needed to achieve a correct classification. Unfortunately, physical layout analysis is error prone. In this paper we present a pre-segmentation step based on a divide & conquer strategy that can be used to improve the page segmentation results, independently of the segmentation algorithm used. This pre-segmentation step is evaluated in classification and retrieval using the selective CRLA algorithm for layout segmentation together with a clustering based on the voronoi area diagram, and tested on two different databases, MARG and Girona Archives.
|
|
|
Albert Gordo, & Florent Perronnin. (2010). A Bag-of-Pages Approach to Unordered Multi-Page Document Classification. In 20th International Conference on Pattern Recognition (1920–1923).
Abstract: We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system.
|
|
|
Albert Gordo, & Florent Perronnin. (2011). Asymmetric Distances for Binary Embeddings. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 729–736).
Abstract: In large-scale query-by-example retrieval, embedding image signatures in a binary space offers two benefits: data compression and search efficiency. While most embedding algorithms binarize both query and database signatures, it has been noted that this is not strictly a requirement. Indeed, asymmetric schemes which binarize the database signatures but not the query still enjoy the same two benefits but may provide superior accuracy. In this work, we propose two general asymmetric distances which are applicable to a wide variety of embedding techniques including Locality Sensitive Hashing (LSH), Locality Sensitive Binary Codes (LSBC), Spectral Hashing (SH) and Semi-Supervised Hashing (SSH). We experiment on four public benchmarks containing up to 1M images and show that the proposed asymmetric distances consistently lead to large improvements over the symmetric Hamming distance for all binary embedding techniques. We also propose a novel simple binary embedding technique – PCA Embedding (PCAE) – which is shown to yield competitive results with respect to more complex algorithms such as SH and SSH.
|
|
|
Albert Gordo, Florent Perronnin, & Ernest Valveny. (2012). Document classification using multiple views. In 10th IAPR International Workshop on Document Analysis Systems (pp. 33–37). IEEE Computer Society Washington.
Abstract: The combination of multiple features or views when representing documents or other kinds of objects usually leads to improved results in classification (and retrieval) tasks. Most systems assume that those views will be available both at training and test time. However, some views may be too `expensive' to be available at test time. In this paper, we consider the use of Canonical Correlation Analysis to leverage `expensive' views that are available only at training time. Experimental results show that this information may significantly improve the results in a classification task.
|
|
|
Albert Gordo, Jaume Gibert, Ernest Valveny, & Marçal Rusiñol. (2010). A Kernel-based Approach to Document Retrieval. In 9th IAPR International Workshop on Document Analysis Systems (377–384).
Abstract: In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain class. The membership probability to a specific class is computed using Support Vector Machines in conjunction with similarity measure based kernel applied to structural document representations. In the presented experiments, we use different document representations, both visual and structural, and we apply them to a database of historical documents. We show how our method based on similarity kernels outperforms the usual distance-based retrieval.
|
|
|
Albert Gordo, Jose Antonio Rodriguez, Florent Perronnin, & Ernest Valveny. (2012). Leveraging category-level labels for instance-level image retrieval. In 25th IEEE Conference on Computer Vision and Pattern Recognition (pp. 3045–3052). IEEE Xplore.
Abstract: In this article, we focus on the problem of large-scale instance-level image retrieval. For efficiency reasons, it is common to represent an image by a fixed-length descriptor which is subsequently encoded into a small number of bits. We note that most encoding techniques include an unsupervised dimensionality reduction step. Our goal in this work is to learn a better subspace in a supervised manner. We especially raise the following question: “can category-level labels be used to learn such a subspace?” To answer this question, we experiment with four learning techniques: the first one is based on a metric learning framework, the second one on attribute representations, the third one on Canonical Correlation Analysis (CCA) and the fourth one on Joint Subspace and Classifier Learning (JSCL). While the first three approaches have been applied in the past to the image retrieval problem, we believe we are the first to show the usefulness of JSCL in this context. In our experiments, we use ImageNet as a source of category-level labels and report retrieval results on two standard dataseis: INRIA Holidays and the University of Kentucky benchmark. Our experimental study shows that metric learning and attributes do not lead to any significant improvement in retrieval accuracy, as opposed to CCA and JSCL. As an example, we report on Holidays an increase in accuracy from 39.3% to 48.6% with 32-dimensional representations. Overall JSCL is shown to yield the best results.
|
|
|
Albert Gordo, Marçal Rusiñol, Dimosthenis Karatzas, & Andrew Bagdanov. (2013). Document Classification and Page Stream Segmentation for Digital Mailroom Applications. In 12th International Conference on Document Analysis and Recognition (pp. 621–625).
Abstract: In this paper we present a method for the segmentation of continuous page streams into multipage documents and the simultaneous classification of the resulting documents. We first present an approach to combine the multiple pages of a document into a single feature vector that represents the whole document. Despite its simplicity and low computational cost, the proposed representation yields results comparable to more complex methods in multipage document classification tasks. We then exploit this representation in the context of page stream segmentation. The most plausible segmentation of a page stream into a sequence of multipage documents is obtained by optimizing a statistical model that represents the probability of each segmented multipage document belonging to a particular class. Experimental results are reported on a large sample of real administrative multipage documents.
|
|
|
Albert Rial-Farras, Meysam Madadi, & Sergio Escalera. (2021). UV-based reconstruction of 3D garments from a single RGB image. In 16th IEEE International Conference on Automatic Face and Gesture Recognition (pp. 1–8).
Abstract: Garments are highly detailed and dynamic objects made up of particles that interact with each other and with other objects, making the task of 2D to 3D garment reconstruction extremely challenging. Therefore, having a lightweight 3D representation capable of modelling fine details is of great importance. This work presents a deep learning framework based on Generative Adversarial Networks (GANs) to reconstruct 3D garment models from a single RGB image. It has the peculiarity of using UV maps to represent 3D data, a lightweight representation capable of dealing with high-resolution details and wrinkles. With this model and kind of 3D representation, we achieve state-of-the-art results on the CLOTH3D++ dataset, generating good quality and realistic garment reconstructions regardless of the garment topology and shape, human pose, occlusions and lightning.
|
|