|
Jose Carlos Rubio, Joan Serrat, Antonio Lopez and N. Paragios. 2012. Image Contextual Representation and Matching through Hierarchies and Higher Order Graphs. 21st International Conference on Pattern Recognition.2664–2667.
Abstract: We present a region matching algorithm which establishes correspondences between regions from two segmented images. An abstract graph-based representation conceals the image in a hierarchical graph, exploiting the scene properties at two levels. First, the similarity and spatial consistency of the image semantic objects is encoded in a graph of commute times. Second, the cluttered regions of the semantic objects are represented with a shape descriptor. Many-to-many matching of regions is specially challenging due to the instability of the segmentation under slight image changes, and we explicitly handle it through high order potentials. We demonstrate the matching approach applied to images of world famous buildings, captured under different conditions, showing the robustness of our method to large variations in illumination and viewpoint.
|
|
|
Idoia Ruiz, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder and Joan Serrat. 2021. Weakly Supervised Multi-Object Tracking and Segmentation. IEEE Winter Conference on Applications of Computer Vision Workshops.125–133.
Abstract: We introduce the problem of weakly supervised MultiObject Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation.
To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by
Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the
objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.
|
|
|
Mohammad Rouhani and Angel Sappa. 2012. Non-Rigid Shape Registration: A Single Linear Least Squares Framework. 12th European Conference on Computer Vision. Springer Berlin Heidelberg, 264–277. (LNCS.)
Abstract: This paper proposes a non-rigid registration formulation capturing both global and local deformations in a single framework. This formulation is based on a quadratic estimation of the registration distance together with a quadratic regularization term. Hence, the optimal transformation parameters are easily obtained by solving a liner system of equations, which guarantee a fast convergence. Experimental results with challenging 2D and 3D shapes are presented to show the validity of the proposed framework. Furthermore, comparisons with the most relevant approaches are provided.
|
|
|
Mohammad Rouhani and Angel Sappa. 2011. Correspondence Free Registration through a Point-to-Model Distance Minimization. 13th IEEE International Conference on Computer Vision.2150–2157.
Abstract: This paper presents a novel formulation, which derives in a smooth minimization problem, to tackle the rigid registration between a given point set and a model set. Unlike most of the existing works, which are based on minimizing a point-wise correspondence term, we propose to describe the model set by means of an implicit representation. It allows a new definition of the registration error, which works beyond the point level representation. Moreover, it could be used in a gradient-based optimization framework. The proposed approach consists of two stages. Firstly, a novel formulation is proposed that relates the registration parameters with the distance between the model and data set. Secondly, the registration parameters are obtained by means of the Levengberg-Marquardt algorithm. Experimental results and comparisons with state of the art show the validity of the proposed framework.
|
|
|
Mohammad Rouhani and Angel Sappa. 2011. Implicit B-Spline Fitting Using the 3L Algorithm. 18th IEEE International Conference on Image Processing.893–896.
|
|
|
German Ros, Jesus Martinez del Rincon and Gines Garcia-Mateos. 2012. Articulated Particle Filter for Hand Tracking. 21st International Conference on Pattern Recognition.3581–3585.
Abstract: This paper proposes a new version of Particle Filter, called Articulated Particle Filter – ArPF -, which has been specifically designed for an efficient sampling of hierarchical spaces, generated by articulated objects. Our approach decomposes the articulated motion into layers for efficiency purposes, making use of a careful modeling of the diffusion noise along with its propagation through the articulations. This produces an increase of accuracy and prevent for divergences. The algorithm is tested on hand tracking due to its complex hierarchical articulated nature. With this purpose, a new dataset generation tool for quantitative evaluation is also presented in this paper.
|
|
|
Gemma Rotger, Felipe Lumbreras, Francesc Moreno-Noguer and Antonio Agudo. 2018. 2D-to-3D Facial Expression Transfer. 24th International Conference on Pattern Recognition.2008–2013.
Abstract: Automatically changing the expression and physical features of a face from an input image is a topic that has been traditionally tackled in a 2D domain. In this paper, we bring this problem to 3D and propose a framework that given an
input RGB video of a human face under a neutral expression, initially computes his/her 3D shape and then performs a transfer to a new and potentially non-observed expression. For this purpose, we parameterize the rest shape –obtained from standard factorization approaches over the input video– using a triangular
mesh which is further clustered into larger macro-segments. The expression transfer problem is then posed as a direct mapping between this shape and a source shape, such as the blend shapes of an off-the-shelf 3D dataset of human facial expressions. The mapping is resolved to be geometrically consistent between 3D models by requiring points in specific regions to map on semantic
equivalent regions. We validate the approach on several synthetic and real examples of input faces that largely differ from the source shapes, yielding very realistic expression transfers even in cases with topology changes, such as a synthetic video sequence of a single-eyed cyclops.
|
|
|
Youssef El Rhabi, Simon Loic, Brun Luc, Josep Llados and Felipe Lumbreras. 2016. Information Theoretic Rotationwise Robust Binary Descriptor Learning. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR).368–378.
Abstract: In this paper, we propose a new data-driven approach for binary descriptor selection. In order to draw a clear analysis of common designs, we present a general information-theoretic selection paradigm. It encompasses several standard binary descriptor construction schemes, including a recent state-of-the-art one named BOLD. We pursue the same endeavor to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure. The effectiveness of our approach is demonstrated on two standard datasets, where our descriptor is compared to BOLD and to several classical descriptors. In particular, it emerges that our approach can reproduce equivalent if not better performance as BOLD while relying on twice shorter descriptors. Such an improvement can be influential for real-time applications.
|
|
|
Arnau Ramisa, Ramon Lopez de Mantaras and Ricardo Toledo. 2007. Comparing Combinations of Feature Regions for Panoramic VSLAM. 4th International Conference on Informatics in Control, Automation and Robotics.292–297.
|
|
|
Muhammad Anwer Rao, Fahad Shahbaz Khan, Joost Van de Weijer and Jorma Laaksonen. 2016. Combining Holistic and Part-based Deep Representations for Computational Painting Categorization. 6th International Conference on Multimedia Retrieval.
Abstract: Automatic analysis of visual art, such as paintings, is a challenging inter-disciplinary research problem. Conventional approaches only rely on global scene characteristics by encoding holistic information for computational painting categorization.We argue that such approaches are sub-optimal and that discriminative common visual structures provide complementary information for painting classification. We present an approach that encodes both the global scene layout and discriminative latent common structures for computational painting categorization. The region of interests are automatically extracted, without any manual part labeling, by training class-specific deformable part-based models. Both holistic and region-of-interests are then described using multi-scale dense convolutional features. These features are pooled separately using Fisher vector encoding and concatenated afterwards in a single image representation. Experiments are performed on a challenging dataset with 91 different painters and 13 diverse painting styles. Our approach outperforms the standard method, which only employs the global scene characteristics. Furthermore, our method achieves state-of-the-art results outperforming a recent multi-scale deep features based approach [11] by 6.4% and 3.8% respectively on artist and style classification.
|
|