2
votes

I'm currently implementing a KD Tree and nearest neighbour search, following the algorithm described here: http://ldots.org/kdtree/

I have come across a couple of different ways to implement a KD Tree, one in which points are stored in internal nodes, and one in which they are only stored in leaf nodes. As I have a very simple use case (all I need to do is construct the tree once, it does not need to be modified), I went for the leaf-only approach is it seemed to be simpler to implement. I have successfully implemented everything, the tree is always constructed successfully and in most cases the nearest neighbour search returns the correct value. However, I have some issues that with some data sets and search points, the algorithm returns an incorrect value. Consider the points:

[[6, 1], [5, 5], [9, 6], [3, 81], [4, 9], [4, 0], [7, 9], [2, 9], [6, 74]]

Which constructs a tree looking something like this (excuse my bad diagramming): a KD Tree

Where the square leaf nodes are those that contain the points, and the circular nodes contain the median value for splitting the list at that depth. When calling my nearest neighbour search on this data set, and looking for the nearest neighbour to [6, 74], the algorithm returns [7, 9]. Although this follows the algorithm correctly, it is not in fact the closest point to [6, 74]. The closest point would actually be [3, 81] which is at a distance of 7.6, [7, 9] is at a distance of 65.

Here are the points plotted, for visualization, the red point being the one I am attempting to find the nearest neighbour for:

enter image description here

If it helps, my search method is as follows:

 private LeafNode search(int depth, Point point, KDNode node) {
        if(node instanceof LeafNode)
            return (LeafNode)node;
        else {
            MedianNode medianNode = (MedianNode) node;

            double meanValue = medianNode.getValue();
            double comparisonValue = 0;
            if(valueEven(depth)) {
                comparisonValue = point.getX();
            }
            else {
                comparisonValue = point.getY();
            }

            KDNode nextNode;
            if(comparisonValue < meanValue) {
                if (node.getLeft() != null)
                    nextNode = node.getLeft();
                else
                    nextNode = node.getRight();
            }
            else {
                if (node.getRight() != null)
                    nextNode = node.getRight();
                else
                    nextNode = node.getLeft();
            }

            return search(depth + 1, point, nextNode);
        }
    }

So my questions are:

  1. Is this what to expect from nearest neighbour search in a KD Tree, or should I be getting the closest point to the point I am searching for (as this is my only reason for using the tree)?

  2. Is this an issue only with this form of KD Tree, should I change it to store points in inner nodes to solve this?

3

3 Answers

2
votes

A correct implementation of a KD-tree always finds the closest point(it doesn't matter if points are stored in leaves only or not). Your search method is not correct, though. Here is how it should look like:

bestDistance = INF

def getClosest(node, point)
    if node is null
        return
    // I will assume that this node splits points 
    // by their x coordinate for the sake of brevity.
    if node is a leaf
        // updateAnswer updates bestDistance value
        // and keeps track of the closest point to the given one.
        updateAnswer(node.point, point)
    else
        middleX = node.median
        if point.x < middleX
            getClosest(node.left, point)
            if node.right.minX - point.x < bestDistance
                getClosest(node.right, point)
        else
            getClosest(node.right, point)
            if point.x - node.left.maxX < bestDistance
                getClosest(node.left, point)
2
votes

The explanation given on ldots.org is just plain wrong (along with many other top Google results on searching KD Trees).

See https://stackoverflow.com/a/37107030/591720 for a correct implementation.

0
votes

Not sure if this answer would be still relevant, but anyway I dare to suggest the following kd-tree implementation: https://github.com/stanislav-antonov/kdtree

The implementation is simple enough and could be useful in a case if one decided to sort out how the things work in practice.

Regarding the way how the tree is built an iterative approach is used, thus its size is limited by a memory and not a stack size.