Sounak Dey, Pau Riba, Anjan Dutta, Josep Llados, & Yi-Zhe Song. (2019). Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2179–2188).
Abstract: In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research.
|
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, & Vladlen Koltun. (2017). CARLA: An Open Urban Driving Simulator. In 1st Annual Conference on Robot Learning. Proceedings of Machine Learning (Vol. 78, pp. 1–16).
Abstract: We introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an endto-end
model trained via imitation learning, and an end-to-end model trained via
reinforcement learning. The approaches are evaluated in controlled scenarios of
increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform’s utility for autonomous driving research.
Keywords: Autonomous driving; sensorimotor control; simulation
|
Fadi Dornaika, Bogdan Raducanu, & Alireza Bosaghzadeh. (2015). Facial expression recognition based on multi observations with application to social robotics. In Bruce Flores (Ed.), Emotional and Facial Expressions: Recognition, Developmental Differences and Social Importance (pp. 153–166). Nova Science publishers.
Abstract: Human-robot interaction is a hot topic nowadays in the social robotics
community. One crucial aspect is represented by the affective communication
which comes encoded through the facial expressions. In this chapter, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, viewand texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial
expression.
|
Anjan Dutta, Umapada Pal, & Josep Llados. (2016). Compact Correlated Features for Writer Independent Signature Verification. In 23rd International Conference on Pattern Recognition.
Abstract: This paper considers the offline signature verification problem which is considered to be an important research line in the field of pattern recognition. In this work we propose hybrid features that consider the local features and their global statistics in the signature image. This has been done by creating a vocabulary of histogram of oriented gradients (HOGs). We impose weights on these local features based on the height information of water reservoirs obtained from the signature. Spatial information between local features are thought to play a vital role in considering the geometry of the signatures which distinguishes the originals from the forged ones. Nevertheless, learning a condensed set of higher order neighbouring features based on visual words, e.g., doublets and triplets, continues to be a challenging problem as possible combinations of visual words grow exponentially. To avoid this explosion of size, we create a code of local pairwise features which are represented as joint descriptors. Local features are paired based on the edges of a graph representation built upon the Delaunay triangulation. We reveal the advantage of combining both type of visual codebooks (order one and pairwise) for signature verification task. This is validated through an encouraging result on two benchmark datasets viz. CEDAR and GPDS300.
|
Fadi Dornaika, & Bogdan Raducanu. (2013). Out-of-Sample Embedding for Manifold Learning Applied to Face Recognition. In IEEE International Workshop on Analysis and Modeling of Faces and Gestures (pp. 862–868).
Abstract: Manifold learning techniques are affected by two critical aspects: (i) the design of the adjacency graphs, and (ii) the embedding of new test data---the out-of-sample problem. For the first aspect, the proposed schemes were heuristically driven. For the second aspect, the difficulty resides in finding an accurate mapping that transfers unseen data samples into an existing manifold. Past works addressing these two aspects were heavily parametric in the sense that the optimal performance is only reached for a suitable parameter choice that should be known in advance. In this paper, we demonstrate that sparse coding theory not only serves for automatic graph reconstruction as shown in recent works, but also represents an accurate alternative for out-of-sample embedding. Considering for a case study the Laplacian Eigenmaps, we applied our method to the face recognition problem. To evaluate the effectiveness of the proposed out-of-sample embedding, experiments are conducted using the k-nearest neighbor (KNN) and Kernel Support Vector Machines (KSVM) classifiers on four public face databases. The experimental results show that the proposed model is able to achieve high categorization effectiveness as well as high consistency with non-linear embeddings/manifolds obtained in batch modes.
|
Fadi Dornaika, & Bogdan Raducanu. (2012). Analysis and Recognition of Facial Expressions in Videos Using Facial Shape Deformation. In S.E. Carter (Ed.), Facial Expressions: Dynamic Patterns, Impairments and Social Perceptions (pp. 157–178). NOVA Publishers.
|
Fadi Dornaika, & Bogdan Raducanu. (2011). Subtle Facial Expression Recognition in Still Images and Videos. In Yu-Jin Zhang (Ed.), Advances in Face Image Analysis: Techniques and Technologies (pp. 259–277). New York, USA: IGI-Global.
Abstract: This chapter addresses the recognition of basic facial expressions. It has three main contributions. First, the authors introduce a view- and texture independent schemes that exploits facial action parameters estimated by an appearance-based 3D face tracker. they represent the learned facial actions associated with different facial expressions by time series. Two dynamic recognition schemes are proposed: (1) the first is based on conditional predictive models and on an analysis-synthesis scheme, and (2) the second is based on examples allowing straightforward use of machine learning approaches. Second, the authors propose an efficient recognition scheme based on the detection of keyframes in videos. Third, the authors compare the dynamic scheme with a static one based on analyzing individual snapshots and show that in general the former performs better than the latter. The authors then provide evaluations of performance using Linear Discriminant Analysis (LDA), Non parametric Discriminant Analysis (NDA), and Support Vector Machines (SVM).
|
Fadi Dornaika, & Franck Davoine. (2006). On appearance based face and facial action tracking. IEEE Transactions on Circuits and Systems for Video Technology, 16(9): 1838–1853.
|
Fadi Dornaika, & Franck Davoine. (2006). Facial expression recognition using auto-regressive models.
|
Fadi Dornaika, & Franck Davoine. (2005). Simultaneous Facial Action Tracking and Expression Recognition using a Particle Filter.
|
Fadi Dornaika, & Franck Davoine. (2005). SFM for planar scenes using image derivatives.
|
Fadi Dornaika, & Franck Davoine. (2005). Facial expression recognition in continuous videos using dynamic programming.
|
Fadi Dornaika, & J. Ahlberg. (2006). Fitting 3D face models for tracking and active appearance model training. Image and Vision Computing, 24(9): 1010–1024.
|
Sounak Dey, Anguelos Nicolaou, Josep Llados, & Umapada Pal. (2019). Evaluation of the Effect of Improper Segmentation on Word Spotting. IJDAR - International Journal on Document Analysis and Recognition, 22, 361–374.
Abstract: Word spotting is an important recognition task in large-scale retrieval of document collections. In most of the cases, methods are developed and evaluated assuming perfect word segmentation. In this paper, we propose an experimental framework to quantify the goodness that word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We have tested our framework on several established and state-of-the-art methods using George Washington and Barcelona Marriage Datasets. The experiments done allow for an estimate of the end-to-end performance of word spotting methods.
|
Sounak Dey, Anguelos Nicolaou, Josep Llados, & Umapada Pal. (2016). Local Binary Pattern for Word Spotting in Handwritten Historical Document. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 574–583). LNCS.
Abstract: Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features, which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally, we compare the results of our proposed retrieval method with other methods in the literature and we obtain the best results in the learning free paradigm.
Keywords: Local binary patterns; Spatial sampling; Learning-free; Word spotting; Handwritten; Historical document analysis; Large-scale data
|