1
votes

I have a list of intervals and I need to return the ones that overlap with an interval passed in a query. What is special is that in a typical query around a third or even half of the intervals will overlap with the one given in the query. Also, the ratio of the shortest interval to the longest is not more than 1:5. I implemented my own interval tree (augmented red-black tree) - I did not want to use existing implementations because I needed support for closed intervals and some special features. I tested the query speed with 6000 queries in a tree with 6000 intervals (so n=6000 and m=3000 (app.)). It turned out that brute force is just as good as using the tree:

Computation time - loop: 125.220461 s
Tree setup: 0.05064 s
Tree Queries: 123.167337 s

Let me use asymptotic analysis. n: number of queries; n: number of intervals; app. n/2: number of intervals returned in a query:

time complexity brute force: n*n

time complexity tree: n*(log(n)+n/2) --> 1/2 nn + nlog(n) --> n*n

So the result is saying that the two should be roughly the same for a large n. Still one would somehow expect the tree to be noticeably faster given the constant 1/2 in front of n*n. So there are three possible reasons I can imagine for the results I got:

a) My implementation is wrong. (Should I be using BFS like below?) b) My implementation is right, but I made things cumbersome for Python so it needs more time to deal with the tree than to deal with brute force. c) everything is OK - it is just how things should behave for a large n

My query function looks like this:

from collections import deque

def query(self,low,high):
    result = []
    q = deque([self.root]) # this is the root node in the tree
    append_result = result.append
    append_q = q.append
    pop_left = q.popleft
    while q:
        node = pop_left() # look at the next node
        if node.overlap(low,high): # some overlap?
            append_result(node.interval)
        if node.low != None and low <= node.get_low_max(): # en-q left node
            append_q(node.low)                
        if node.high != None and node.get_high_min() <= high: # en-q right node
            append_q(node.high)

I build the tree like this:

def build(self, intervals):
    """
    Function which is recursively called to build the tree.
    """
    if intervals is None:
        return None

    if len(intervals) > 2: # intervals is always sorted in increasing order
        mid = len(intervals)//2
        # split intervals into three parts:
        # central element (median)
        center = intervals[mid]
        # left half (<= median)
        new_low = intervals[:mid]
        #right half (>= median)
        new_high = intervals[mid+1:]
        #compute max on the lower side (left):
        max_low = max([n.get_high() for n in new_low])
        #store min on the higher side (right):
        min_high = new_high[0].get_low()

    elif len(intervals) == 2:
        center = intervals[1]
        new_low = [intervals[0]]
        new_high = None
        max_low = intervals[0].get_high()
        min_high = None

    elif len(intervals) == 1:
        center = intervals[0]
        new_low = None
        new_high = None
        max_low = None
        min_high = None

    else:
        raise Exception('The tree is not behaving as it should...')

    return(Node(center, self.build(new_low),self.build(new_high),
                max_low, min_high))

EDIT:

A node is represented like this:

class Node:
    def __init__(self, interval, low, high, max_low, min_high):
        self.interval = interval # pointer to corresponding interval object
        self.low = low # pointer to node containing intervals to the left
        self.high = high # pointer to node containing intervals to the right
        self.max_low = max_low # maxiumum value on the left side
        self.min_high = min_high # minimum value on the right side

All the nodes in a subtree can be obtained like this:

def subtree(current):
    node_list = []
    if current.low != None:
        node_list += subtree(current.low)
    node_list += [current]
    if current.high != None:
        node_list += subtree(current.high)
    return node_list

p.s. note that by exploiting that there is so much overlap and that all intervals have comparable lenghts, I managed to implement a simple method based on sorting and bisection that completed in 80 s, but I would say this is over-fitting... Amusingly, by using asymptotic analysis, I found it should have app. the same runtime as using the tree...

1
Asymptotic analysis tells almost nothing about the actual running time :) also, have you tried using a profiler?BlackBear

1 Answers

0
votes

If I correctly understand your problem, you are trying to speed up your process. If it is that, try to create a real tree instead of manipulating lists.

Something that looks like :

class IntervalTreeNode():
    def __init__(self, parent, min, max):
        self.value      = (min,max)
        self.parent     = parent

        self.leftBranch = None
        self.rightBranch= None

    def insert(self, interval):
        ...

    def asList(self):
        """ return the list that is this node and all the subtree nodes """
        left=[]
        if (self.leftBranch != None):
            left = self.leftBranch.asList()
        right=[]
        if (self.rightBranch != None):
            left = self.rightBranch.asList()
        return [self.value] + left + right

And then at start create an internalTreeNode and insert all yours intervals in. This way, if you really need a list you can build a list each time you need a result and not each time you make a step in your recursive iteration using [x:] or [:x] as list manipulation is a costly operation in python. It is possible to work also using directly the nodes instead of a list that will greatly speed up the process as you just have to return a reference to the node instead of doing some list addition.