|
Ignasi Rius, Jordi Gonzalez, Mikhail Mozerov, & Xavier Roca. (2008). Automatic Learning of 3D Pose Variability in Walking Performances for Gait Analysis. International Journal for Computational Vision and Biomechanics, 33–43.
|
|
|
Ignasi Rius, Jordi Gonzalez, Javier Varona, & Xavier Roca. (2009). Action-specific motion prior for efficient bayesian 3D human body tracking. PR - Pattern Recognition, 42(11), 2907–2921.
Abstract: In this paper, we aim to reconstruct the 3D motion parameters of a human body
model from the known 2D positions of a reduced set of joints in the image plane.
Towards this end, an action-specific motion model is trained from a database of real
motion-captured performances. The learnt motion model is used within a particle
filtering framework as a priori knowledge on human motion. First, our dynamic
model guides the particles according to similar situations previously learnt. Then, the solution space is constrained so only feasible human postures are accepted as valid solutions at each time step. As a result, we are able to track the 3D configuration of the full human body from several cycles of walking motion sequences using only the 2D positions of a very reduced set of joints from lateral or frontal viewpoints.
|
|
|
Ignasi Rius, Javier Varona, Xavier Roca, & Jordi Gonzalez. (2006). Posture Constraints for Bayesian Human Motion Tracking. In IV Conference on Articulated Motion and Deformable Objects (AMDO´06), LNCS 4069: 414–423.
|
|
|
Ignasi Rius, Javier Varona, Jordi Gonzalez, & Juan J. Villanueva. (2006). Action Spaces for Efficient Bayesian Tracking of Human Motion.
|
|
|
Ignasi Rius, Dani Rowe, Jordi Gonzalez, & Xavier Roca. (2005). A 3D Dynamic Model of Human Actions for Probabilistic Image Tracking. In Pattern Recognition and Image Analysis (IbPRIA 2005), LNCS 3522: 529–536.
|
|
|
Ignasi Rius, Dani Rowe, Jordi Gonzalez, & Xavier Roca. (2005). 3D Action Modeling and Reconstruction for 2D Human Body Tracking.
|
|
|
Ignasi Rius. (2005). Articulated 3D Human Motion Moldeling for Tracking and Reconstruction.
|
|
|
Ignasi Rius. (2010). Motion Priors for Efficient Bayesian Tracking in Human Sequence Evaluation (Jordi Gonzalez, & Xavier Roca, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Recovering human motion by visual analysis is a challenging computer vision research
area with a lot of potential applications. Model-based tracking approaches, and in
particular particle lters, formulate the problem as a Bayesian inference task whose
aim is to sequentially estimate the distribution of the parameters of a human body
model over time. These approaches strongly rely on good dynamical and observation
models to predict and update congurations of the human body according to measurements from the image data. However, it is very dicult to design observation
models which extract useful and reliable information from image sequences robustly.
This results specially challenging in monocular tracking given that only one viewpoint
from the scene is available. Therefore, to overcome these limitations strong motion
priors are needed to guide the exploration of the state space.
The work presented in this Thesis is aimed to retrieve the 3D motion parameters
of a human body model from incomplete and noisy measurements of a monocular
image sequence. These measurements consist of the 2D positions of a reduced set of
joints in the image plane. Towards this end, we present a novel action-specic model
of human motion which is trained from several databases of real motion-captured
performances of an action, and is used as a priori knowledge within a particle ltering
scheme.
Body postures are represented by means of a simple and compact stick gure
model which uses direction cosines to represent the direction of body limbs in the 3D
Cartesian space. Then, for a given action, Principal Component Analysis is applied to
the training data to perform dimensionality reduction over the highly correlated input
data. Before the learning stage of the action model, the input motion performances
are synchronized by means of a novel dense matching algorithm based on Dynamic
Programming. The algorithm synchronizes all the motion sequences of the same
action class, nding an optimal solution in real-time.
Then, a probabilistic action model is learnt, based on the synchronized motion
examples, which captures the variability and temporal evolution of full-body motion
within a specic action. In particular, for each action, the parameters learnt are: a
representative manifold for the action consisting of its mean performance, the standard deviation from the mean performance, the mean observed direction vectors from
each motion subsequence of a given length and the expected error at a given time
instant.
Subsequently, the action-specic model is used as a priori knowledge on human
motion which improves the eciency and robustness of the overall particle filtering tracking framework. First, the dynamic model guides the particles according to similar
situations previously learnt. Then, the state space is constrained so only feasible
human postures are accepted as valid solutions at each time step. As a result, the
state space is explored more eciently as the particle set covers the most probable
body postures.
Finally, experiments are carried out using test sequences from several motion
databases. Results point out that our tracker scheme is able to estimate the rough
3D conguration of a full-body model providing only the 2D positions of a reduced
set of joints. Separate tests on the sequence synchronization method and the subsequence probabilistic matching technique are also provided.
|
|
|
Idoia Ruiz, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, & Joan Serrat. (2021). Weakly Supervised Multi-Object Tracking and Segmentation. In IEEE Winter Conference on Applications of Computer Vision Workshops (pp. 125–133).
Abstract: We introduce the problem of weakly supervised MultiObject Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation.
To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by
Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the
objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.
|
|
|
Idoia Ruiz, & Joan Serrat. (2020). Rank-based ordinal classification. In 25th International Conference on Pattern Recognition (pp. 8069–8076).
Abstract: Differently from the regular classification task, in ordinal classification there is an order in the classes. As a consequence not all classification errors matter the same: a predicted class close to the groundtruth one is better than predicting a farther away class. To account for this, most previous works employ loss functions based on the absolute difference between the predicted and groundtruth class labels. We argue that there are many cases in ordinal classification where label values are arbitrary (for instance 1. . . C, being C the number of classes) and thus such loss functions may not be the best choice. We instead propose a network architecture that produces not a single class prediction but an ordered vector, or ranking, of all the possible classes from most to least likely. This is thanks to a loss function that compares groundtruth and predicted rankings of these class labels, not the labels themselves. Another advantage of this new formulation is that we can enforce consistency in the predictions, namely, predicted rankings come from some unimodal vector of scores with mode at the groundtruth class. We compare with the state of the art ordinal classification methods, showing
that ours attains equal or better performance, as measured by common ordinal classification metrics, on three benchmark datasets. Furthermore, it is also suitable for a new task on image aesthetics assessment, i.e. most voted score prediction. Finally, we also apply it to building damage assessment from satellite images, providing an analysis of its performance depending on the degree of imbalance of the dataset.
|
|
|
Idoia Ruiz, & Joan Serrat. (2022). Hierarchical Novelty Detection for Traffic Sign Recognition. SENS - Sensors, 22(12), 4389.
Abstract: Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy.
Keywords: Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision
|
|
|
Idoia Ruiz, Bogdan Raducanu, Rakesh Mehta, & Jaume Amores. (2020). Optimizing speed/accuracy trade-off for person re-identification via knowledge distillation. EAAI - Engineering Applications of Artificial Intelligence, 87, 103309.
Abstract: Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO and XQDA, to deep learning based techniques, using image classification networks, ResNet and MobileNets. Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time. We evaluate both methods on the Market-1501 and DukeMTMC-reID large-scale datasets, showing that distillation helps reducing the computational cost at inference time while even increasing the accuracy performance.
Keywords: Person re-identification; Network distillation; Image retrieval; Model compression; Surveillance
|
|
|
Idoia Ruiz. (2022). Deep Metric Learning for re-identification, tracking and hierarchical novelty detection (Joan Serrat, Ed.). Ph.D. thesis, , .
Abstract: Metric learning refers to the problem in machine learning of learning a distance or similarity measurement to compare data. In particular, deep metric learning involves learning a representation, also referred to as embedding, such that in the embedding space data samples can be compared based on the distance, directly providing a similarity measure. This step is necessary to perform several tasks in computer vision. It allows to perform the classification of images, regions or pixels, re-identification, out-of-distribution detection, object tracking in image sequences and any other task that requires computing a similarity score for their solution. This thesis addresses three specific problems that share this common requirement. The first one is person re-identification. Essentially, it is an image retrieval task that aims at finding instances of the same person according to a similarity measure. We first compare in terms of accuracy and efficiency, classical metric learning to basic deep learning based methods for this problem. In this context, we also study network distillation as a strategy to optimize the trade-off between accuracy and speed at inference time. The second problem we contribute to is novelty detection in image classification. It consists in detecting samples of novel classes, i.e. never seen during training. However, standard novelty detection does not provide any information about the novel samples besides they are unknown. Aiming at more informative outputs, we take advantage from the hierarchical taxonomies that are intrinsic to the classes. We propose a metric learning based approach that leverages the hierarchical relationships among classes during training, being able to predict the parent class for a novel sample in such hierarchical taxonomy. Our third contribution is in multi-object tracking and segmentation. This joint task comprises classification, detection, instance segmentation and tracking. Tracking can be formulated as a retrieval problem to be addressed with metric learning approaches. We tackle the existing difficulty in academic research that is the lack of annotated benchmarks for this task. To this matter, we introduce the problem of weakly supervised multi-object tracking and segmentation, facing the challenge of not having available ground truth for instance segmentation. We propose a synergistic training strategy that benefits from the knowledge of the supervised tasks that are being learnt simultaneously.
|
|
|
Iban Berganzo-Besga, Hector A. Orengo, Felipe Lumbreras, Paloma Aliende, & Monica N. Ramsey. (2022). Automated detection and classification of multi-cell Phytoliths using Deep Learning-Based Algorithms. JArchSci - Journal of Archaeological Science, 148, 105654.
Abstract: This paper presents an algorithm for automated detection and classification of multi-cell phytoliths, one of the major components of many archaeological and paleoenvironmental deposits. This identification, based on phytolith wave pattern, is made using a pretrained VGG19 deep learning model. This approach has been tested in three key phytolith genera for the study of agricultural origins in Near East archaeology: Avena, Hordeum and Triticum. Also, this classification has been validated at species-level using Triticum boeoticum and dicoccoides images. Due to the diversity of microscopes, cameras and chemical treatments that can influence images of phytolith slides, three types of data augmentation techniques have been implemented: rotation of the images at 45-degree angles, random colour and brightness jittering, and random blur/sharpen. The implemented workflow has resulted in an overall accuracy of 93.68% for phytolith genera, improving previous attempts. The algorithm has also demonstrated its potential to automatize the classification of phytoliths species with an overall accuracy of 100%. The open code and platforms employed to develop the algorithm assure the method's accessibility, reproducibility and reusability.
|
|
|
Iban Berganzo-Besga, Hector A. Orengo, Felipe Lumbreras, Aftab Alam, Rosie Campbell, Petrus J Gerrits, et al. (2023). Curriculum learning-based strategy for low-density archaeological mound detection from historical maps in India and Pakistan. ScR - Scientific Reports, 13, 11257.
Abstract: This paper presents two algorithms for the large-scale automatic detection and instance segmentation of potential archaeological mounds on historical maps. Historical maps present a unique source of information for the reconstruction of ancient landscapes. The last 100 years have seen unprecedented landscape modifications with the introduction and large-scale implementation of mechanised agriculture, channel-based irrigation schemes, and urban expansion to name but a few. Historical maps offer a window onto disappearing landscapes where many historical and archaeological elements that no longer exist today are depicted. The algorithms focus on the detection and shape extraction of mound features with high probability of being archaeological settlements, mounds being one of the most commonly documented archaeological features to be found in the Survey of India historical map series, although not necessarily recognised as such at the time of surveying. Mound features with high archaeological potential are most commonly depicted through hachures or contour-equivalent form-lines, therefore, an algorithm has been designed to detect each of those features. Our proposed approach addresses two of the most common issues in archaeological automated survey, the low-density of archaeological features to be detected, and the small amount of training data available. It has been applied to all types of maps available of the historic 1″ to 1-mile series, thus increasing the complexity of the detection. Moreover, the inclusion of synthetic data, along with a Curriculum Learning strategy, has allowed the algorithm to better understand what the mound features look like. Likewise, a series of filters based on topographic setting, form, and size have been applied to improve the accuracy of the models. The resulting algorithms have a recall value of 52.61% and a precision of 82.31% for the hachure mounds, and a recall value of 70.80% and a precision of 70.29% for the form-line mounds, which allowed the detection of nearly 6000 mound features over an area of 470,500 km2, the largest such approach to have ever been applied. If we restrict our focus to the maps most similar to those used in the algorithm training, we reach recall values greater than 60% and precision values greater than 90%. This approach has shown the potential to implement an adaptive algorithm that allows, after a small amount of retraining with data detected from a new map, a better general mound feature detection in the same map.
|
|