14
votes

My math-fu is failing me! I need an efficient way of reducing network ranges to supersets, e.g. if I input list of IP ranges:

  • 1.1.1.1 to 2.2.2.5
  • 1.1.1.2 to 2.2.2.4
  • 10.5.5.5 to 155.5.5.5
  • 10.5.5.6 to 10.5.5.7

I want to return the following ranges:

  • 1.1.1.1 to 2.2.2.5
  • 10.5.5.5 to 155.5.5.5

Note: the input lists are not ordered (though they could be?). The naive way to do this is to check every range in the list to see if the input range x is a subset, and if so, NOT insert range x. However, whenever you insert a new range it might be a superset of existing ranges, so you have to check the existing ranges to see if they can be collapsed (e.g., removed from my list).

4
Are the ranges going to be disjoint? If not, how do you want your algorithm to handle overlapping ranges, many of of which a subrange could be part of?HenryR
What's the issue with using the 'naive' algorithm you describe? It seems fine to me...Mike F
Great question! The users can input whatever ranges they want, so it's not guaranteed they'd be disjoint. It would be nice if all contiguous ranges could be collapsed into one, actually.Jen A
To MikeF: the naive way could be slow as crap, if I have to do a compare against every existing item in the list whenever I insert. We have some customers with > 40k ranges (and who want to add more!). I don't want to have to do 40k comparisons to do an insert.Jen A
The naive way also would, as described, fail to handle cases where two ranges overlap without one being entirely contained by the other.Dave Sherohman

4 Answers

17
votes

This is a union of segments computation. An optimal algorithm (in O(nlog(n))) consists in doing the following:

  1. sort all endpoints (starting and ending points) in a list L (each endpoint should know the segment it belongs to). If an endpoint is equal to a starting point, the starting point should be considered smaller than the enpoint.
  2. go through the sorted list L from left to right and maintain the number LE-RE, where LE is the number of left endpoints that you have already passed, and RE is the number of right endpoints that you have already passed.
  3. each time LE-RE reaches zero, you are at the end of a connected union of segments, and you know that the union of the segments you have seen before (since the previous return to zero) is one superset.
  4. if you also maintained the min and max, between each return to zero, you have the bounds of the superset.

At the end, you obtain a sorted list of disjoint supersets. Still, two supersets A and B can be adjacent (the endpoint of A is just before the starting point of B). If you want A and B to be merged, you can do this either by a simple postprocessing step, or by slightly modifying step 3: when LE-RE reaches zero, you would consider it the end of a superset only if the next element in L is not the direct successor of your current element.

5
votes

You know that you can easily convert IPv4 addresses to int numbers (int32 numbers), do you? Working with int numbers is much easier. So basically every address is a number in the range 0 to 2^32. Every range has a start number and an end number. Your example

1.1.1.1 to 2.2.2.5
1.1.1.2 to 2.2.2.4

can be written as

16,843,009 to 33,686,021
16,843,010 to 33,686,020

So it's pretty easy to see if one range is within the other range. A range is completely within the other range if the following condition is given

startIP2 >= startIP1 && startIP2 <= endIP1 &&
endIP1 >= startIP1 && endIP2 <= endIP1

In that case the range startIP2-endIP2 is completely within startIP1-endIP1. If only the first line is true, then startIP2 is within the range startIP1-endIP1, but the end is beyond the range. If only the second line is true, the endIP is within the range, but the start IP is beyond the range. In that case, if only one line is true, you need to expand the range at the beginning or at the end. If both lines are false, the ranges are completely disjoint, in that case they are two completely independent ranges.

0
votes

What you need to do is simply check the ranges for overlap. If two ranges overlap, then they get merged into a single range. Ranges overlap if the right hand side of one range is greater than the left hand side of another.

0
votes

Alright, my coworker came up with this answer, which I think is pretty excellent. Let me know if you see any issues:

  • Order the IP ranges by StartingIP
  • For each row "x" to insert:
    • If there is a previous row "y" in the list, fetch:
      • If x and y are contiguous, extend y to x's EndingIP
      • Else if x.StartingIP <= y.StartingIP and x.EndingIP > y.EndingIP, extend y to x.EndingIP
      • Else if x is a subset of y, do nothing
      • Else, create a new range
    • Else, create a new range and insert into the list