7
votes

I'd like to represent a set of integer ranges using Python where the set could be modified dynamically and tested for inclusion. Specifically I want to apply this to address ranges or line numbers in a file.

I could define the range of addresses I cared about to include:

200 - 400  
450 - 470  
700 - 900  

Then I want to be able to add a potentially overlapping range to the set such that when I add 460 - 490 the set becomes:

200 - 400  
450 - 490  
700 - 900  

But then be able to delete from the set where I could exclude the range 300 - 350 and the set becomes:

200 - 300
350 - 400  
450 - 490  
700 - 900  

Finally, I want to be able to iterate over all integers included in the set, or test whether the set contains a particular value.

I'm wondering what the best way to do this is (particularly if there's something built into Python).

3
It's not clear whether you're looking for open intervals or closed ones here, but you don't seem to be asking for the usual half-open intervals used everywhere in Python. At any rate, there are a number of good interval/interval-set/interval-tree libraries on PyPI, with different degrees of complexity (e.g., some can mix and match all four kinds of openness, some can handle only integers while others can handle any comparable types, etc.), but Stack Overflow isn't a good place to ask for recommendations for a specific one, even if your exact requirements were 100% clear. - abarnert
As I was starting to play with interval classes I began to run across this exact issue. I suppose I really want 200-399, 450-469, ... but the intervaltree class seem to represent in a way that seems intuitive enough to me. Good to keep in mind though. - Chris W
Actually, you probably do want 200-400, and in particular the half-open interval 200-400, the one that matches Python's range(200, 400) and lst[200:400] and so on, which includes 200 but not 400. But the point is that you have to know which of the four possibilities you want before implementing it. - abarnert

3 Answers

6
votes

You're describing an interval tree.

pip install intervaltree

Usage:

from intervaltree import IntervalTree, Interval
tree = IntervalTree()
tree[200:400] = True  # or you can use ranges as the "values"
tree[450:470] = True
tree[700:900] = True

Querying:

>>> tree
IntervalTree([Interval(200, 400, True), Interval(450, 470, True), Interval(700, 900, True)])
>>> tree[250]
{Interval(200, 400, True)}
>>> tree[150]
set()

Adding overlapping range:

>>> tree[450:490] = True
>>> tree
IntervalTree([Interval(200, 400, True), Interval(450, 470, True), Interval(450, 490, True), Interval(700, 900, True)])
>>> tree.merge_overlaps()
>>> tree
IntervalTree([Interval(200, 400, True), Interval(450, 490), Interval(700, 900, True)])

Discarding:

>>> tree.chop(300, 350)
>>> tree
IntervalTree([Interval(200, 300, True), Interval(350, 400, True), Interval(450, 490), Interval(700, 900, True)])
1
votes

I implemented a similar thing in a different language.

Basic ideas:

  • Keep a tree of ranges ordered by the left bound. Instead of a real tree, you can keep a Python list and use bisect to search it in log time. This gives you the lookup / inclusion test.
  • Represent all operations as sub-range operations. A single element operation just works on a sub-range of length 1 internally.
  • Implement the basic sub-range operations: addition, which is simple, and subtraction, which may end up with two sub-ranges if a middle is excluded, or one subrange, possibly empty.
  • After an addition, check the immediate neighbor subranges, and maybe merge with one or both of them, if the subranges intersect. Continue in both directions until no merge operations occur.

This should get you started.

0
votes

Using python built-in functions you could use something along the lines of:

range_1 = range(450, 470)
range_temp = range(460, 490)

if any(num in range_1 for num in range_temp):
    start = min(range_1.start, range_temp.start)
    stop = max(range_1.stop, range_temp.stop)
    range_1 = range(start, stop)

Of course there would need to be further logic to check the overlaps of multiple range intervals.