A Flexible Outlier Detector Based on a Topology Given by Graph Communities
Oriol Ramos Terrades
author
Albert Berenguel
author
Debora Gil
author
2022
Outlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors.
This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings.
Classification algorithms
Detection algorithms
Description of feature space local structure
Graph communities
Machine learning algorithms
Outlier detectors
DAG; IAM; 600.140; 600.121; 600.139; 600.145; 600.159
exported from refbase (http://refbase.cvc.uab.es/show.php?record=3718), last updated on Wed, 28 Sep 2022 12:20:47 +0200
text
10.1016/j.bdr.2022.100332
Big Data Research
BDR
2022
continuing
periodical
academic journal
29
100332