What is the generally used and accepted way to handle LOF scores as inifinite in ELKI, due to duplicate points? If LOF scores of ELKI to be used, should such scores be considered as maximum-scores, zeros, or inliers?
1 Answers
The LOF score of a point is infinite if at least one neighbor of a point has reachability distance 0 (because they are duplicate points).
If the point itself has a non-zero reachability, the value is thus infinitely higher than the lrd of the neighbors (or in terms of density: the point is infinitely less dense than the neighbors), so it is an outlier.
The proper way of handling this is to increase k (minpts) to be larger than the maximum number of duplicate points. If you have too many duplicate points, this usually indicates that using LOF may not be a good idea for this data set. LOF requires that a nearest-neighbor density estimation makes sense on the data, and if you have this kind of problems, the cause usually is the input data, not the algorithm.