Muhammad Muzzamil Luqman, Thierry Brouard, Jean-Yves Ramel, & Josep Llados. (2012). Recherche de sous-graphes par encapsulation floue des cliques d'ordre 2: Application à la localisation de contenu dans les images de documents graphiques. In Colloque International Francophone sur l'Écrit et le Document (pp. 149–162).
|
Muhammad Muzzamil Luqman, Thierry Brouard, Jean-Yves Ramel, & Josep Llados. (2010). Vers une approche foue of encapsulation de graphes: application a la reconnaissance de symboles. In Colloque International Francophone sur l'Écrit et le Document (pp. 169–184).
Abstract: We present a new methodology for symbol recognition, by employing a structural approach for representing visual associations in symbols and a statistical classifier for recognition. A graphic symbol is vectorized, its topological and geometrical details are encoded by an attributed relational graph and a signature is computed for it. Data adapted fuzzy intervals have been introduced for addressing the sensitivity of structural representations to noise. The joint probability distribution of signatures is encoded by a Bayesian network, which serves as a mechanism for pruning irrelevant features and choosing a subset of interesting features from structural signatures of underlying symbol set, and is deployed in a supervised learning scenario for recognizing query symbols. Experimental results on pre-segmented 2D linear architectural and electronic symbols from GREC databases are presented.
Keywords: Fuzzy interval; Graph embedding; Bayesian network; Symbol recognition
|
Muhammad Muzzamil Luqman, Thierry Brouard, Jean-Yves Ramel, & Josep Llados. (2010). A Content Spotting System For Line Drawing Graphic Document Images. In 20th International Conference on Pattern Recognition (Vol. 20, 3420–3423).
Abstract: We present a content spotting system for line drawing graphic document images. The proposed system is sufficiently domain independent and takes the keyword based information retrieval for graphic documents, one step forward, to Query By Example (QBE) and focused retrieval. During offline learning mode: we vectorize the documents in the repository, represent them by attributed relational graphs, extract regions of interest (ROIs) from them, convert each ROI to a fuzzy structural signature, cluster similar signatures to form ROI classes and build an index for the repository. During online querying mode: a Bayesian network classifier recognizes the ROIs in the query image and the corresponding documents are fetched by looking up in the repository index. Experimental results are presented for synthetic images of architectural and electronic documents.
|
Murad Al Haj. (2008). Face Detection in Color Images Using Primitive Shape Features.
|
Murad Al Haj. (2013). Looking at Faces: Detection, Tracking and Pose Estimation (Jordi Gonzalez, & Xavier Roca, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Humans can effortlessly perceive faces, follow them over space and time, and decode their rich content, such as pose, identity and expression. However, despite many decades of research on automatic facial perception in areas like face detection, expression recognition, pose estimation and face recognition, and despite many successes, a complete solution remains elusive. This thesis is dedicated to three problems in automatic face perception, namely face detection, face tracking and pose estimation.
In face detection, an initial simple model is presented that uses pixel-based heuristics to segment skin locations and hand-crafted rules to determine the locations of the faces present in an image. Different colorspaces are studied to judge whether a colorspace transformation can aid skin color detection. The output of this study is used in the design of a more complex face detector that is able to successfully generalize to different scenarios.
In face tracking, a framework that combines estimation and control in a joint scheme is presented to track a face with a single pan-tilt-zoom camera. While this work is mainly motivated by tracking faces, it can be easily applied atop of any detector to track different objects. The applicability of this method is demonstrated on simulated as well as real-life scenarios.
The last and most important part of this thesis is dedicate to monocular head pose estimation. In this part, a method based on partial least squares (PLS) regression is proposed to estimate pose and solve the alignment problem simultaneously. The contributions of this work are two-fold: 1) demonstrating that the proposed method achieves better than state-of-the-art results on the estimation problem and 2) developing a technique to reduce misalignment based on the learned PLS factors that outperform multiple instance learning (MIL) without the need for any re-training or the inclusion of misaligned samples in the training process, as normally done in MIL.
|
Murad Al Haj, Andrew Bagdanov, Jordi Gonzalez, & Xavier Roca. (2009). Robust and Efficient Multipose Face Detection Using Skin Color Segmentation. In 4th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 5524). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we describe an efficient technique for detecting faces in arbitrary images and video sequences. The approach is based on segmentation of images or video frames into skin-colored blobs using a pixel-based heuristic. Scale and translation invariant features are then computed from these segmented blobs which are used to perform statistical discrimination between face and non-face classes. We train and evaluate our method on a standard, publicly available database of face images and analyze its performance over a range of statistical pattern classifiers. The generalization of our approach is illustrated by testing on an independent sequence of frames containing many faces and non-faces. These experiments indicate that our proposed approach obtains false positive rates comparable to more complex, state-of-the-art techniques, and that it generalizes better to new data. Furthermore, the use of skin blobs and invariant features requires fewer training samples since significantly fewer non-face candidate regions must be considered when compared to AdaBoost-based approaches.
|
Murad Al Haj, Andrew Bagdanov, Jordi Gonzalez, & Xavier Roca. (2010). Reactive object tracking with a single PTZ camera. In 20th International Conference on Pattern Recognition (1690–1693).
Abstract: In this paper we describe a novel approach to reactive tracking of moving targets with a pan-tilt-zoom camera. The approach uses an extended Kalman filter to jointly track the object position in the real world, its velocity in 3D and the camera intrinsics, in addition to the rate of change of these parameters. The filter outputs are used as inputs to PID controllers which continuously adjust the camera motion in order to reactively track the object at a constant image velocity while simultaneously maintaining a desirable target scale in the image plane. We provide experimental results on simulated and real tracking sequences to show how our tracker is able to accurately estimate both 3D object position and camera intrinsics with very high precision over a wide range of focal lengths.
|
Murad Al Haj, Carles Fernandez, Zhanwu Xiong, Ivan Huerta, Jordi Gonzalez, & Xavier Roca. (2011). Beyond the Static Camera: Issues and Trends in Active Vision. In Th.B. Moeslund, A. Hilton, V. Krüger, & L. Sigal (Eds.), Visual Analysis of Humans: Looking at People (pp. 11–30). Springer London.
Abstract: Maximizing both the area coverage and the resolution per target is highly desirable in many applications of computer vision. However, with a limited number of cameras viewing a scene, the two objectives are contradictory. This chapter is dedicated to active vision systems, trying to achieve a trade-off between these two aims and examining the use of high-level reasoning in such scenarios. The chapter starts by introducing different approaches to active cameras configurations. Later, a single active camera system to track a moving object is developed, offering the reader first-hand understanding of the issues involved. Another section discusses practical considerations in building an active vision platform, taking as an example a multi-camera system developed for a European project. The last section of the chapter reflects upon the future trends of using semantic factors to drive smartly coordinated active systems.
|
Murad Al Haj, Francisco Javier Orozco, Jordi Gonzalez, & Juan J. Villanueva. (2008). Automatic Face and Facial Features Initialization for Robust and Accurate Tracking. In 19th International Conference on Pattern Recognition. (1– 4).
|
Murad Al Haj, Jordi Gonzalez, & Larry S. Davis. (2012). On Partial Least Squares in Head Pose Estimation: How to simultaneously deal with misalignment. In 25th IEEE Conference on Computer Vision and Pattern Recognition (pp. 2602–2609). IEEE Xplore.
Abstract: Head pose estimation is a critical problem in many computer vision applications. These include human computer interaction, video surveillance, face and expression recognition. In most prior work on heads pose estimation, the positions of the faces on which the pose is to be estimated are specified manually. Therefore, the results are reported without studying the effect of misalignment. We propose a method based on partial least squares (PLS) regression to estimate pose and solve the alignment problem simultaneously. The contributions of this paper are two-fold: 1) we show that the kernel version of PLS (kPLS) achieves better than state-of-the-art results on the estimation problem and 2) we develop a technique to reduce misalignment based on the learned PLS factors.
|
Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem AlJabea, Ruben Ballester, et al. (2024). TopoX: A Suite of Python Packages for Machine Learning on Topological Domains.
Abstract: We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at this https URL.
|
N. Nayef, F. Yin, I. Bizid, H.Choi, Y. Feng, Dimosthenis Karatzas, et al. (2017). ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification – RRC-MLT. In 14th International Conference on Document Analysis and Recognition (pp. 1454–1459).
Abstract: Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and script classification. We have received a total of 16 participations from the research and industrial communities. This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge.
|
N. Pares, & J.R. Serra. (1992). Tailleur: El problema del sastre..
|
N. Serrano, L. Tarazon, D. Perez, Oriol Ramos Terrades, & S. Juan. (2010). The GIDOC Prototype. In 10th International Workshop on Pattern Recognition in Information Systems (pp. 82–89).
Abstract: Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. It might be carried out by first processing all document images off-line, and then manually supervising system transcriptions to edit incorrect parts. However, current techniques for automatic page layout analysis, text line detection and handwriting recognition are still far from perfect, and thus post-editing system output is not clearly better than simply ignoring it.
A more effective approach to transcribe old text documents is to follow an interactive- predictive paradigm in which both, the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Following this approach, a system prototype called GIDOC (Gimp-based Interactive transcription of old text DOCuments) has been developed to provide user-friendly, integrated support for interactive-predictive layout analysis, line detection and handwriting transcription.
GIDOC is designed to work with (large) collections of homogeneous documents, that is, of similar structure and writing styles. They are annotated sequentially, by (par- tially) supervising hypotheses drawn from statistical models that are constantly updated with an increasing number of available annotated documents. And this is done at different annotation levels. For instance, at the level of page layout analysis, GIDOC uses a novel text block detection method in which conventional, memoryless techniques are improved with a “history” model of text block positions. Similarly, at the level of text line image transcription, GIDOC includes a handwriting recognizer which is steadily improved with a growing number of (partially) supervised transcriptions.
|
N. Zakaria, Jean-Marc Ogier, & Josep Llados. (2005). On-line Graphics Recognition based on Invariant Spatio-Sequential Descriptor: Fuzzy Matrix.
|