Generating non-consecutive combinations

Question

I am trying to create a generator (iterator which supports doing a next, perhaps using yield in python) which gives all combinations of r elements from {1,2,...n} (n and r are parameters) such that in the selected r elements, no two are consecutive.

For example, for r = 2 and n= 4

The generated combinations are {1,3}, {1,4}, {2, 4}.

I could generate all combinations(as an iterator) and filter those which don't satisfy the criteria, but we will be doing unnecessary work.

Is there some generation algorithm such that the next is O(1) (and if that is not possible, O(r) or O(n)).

The order in which the sets are returned is not relevant (and hopefully will allow an O(1) algorithm).

Note: I have tagged it python, but a language-agnostic algorithm will help too.

Update:

I have found a way to map it to generating pure combinations! A web search reveals that O(1) is possible for combinations (though it seems complicated).

Here is the mapping.

Suppose we have a combination x_1, x_2, ... , x_r with x_1 + 1 < x_2, x_2 + 1 < x_3, ...

We map to y_1, y_2, ..., y_r as follows

y_1 = x_1

y_2 = x_2 - 1

y_3 = x_3 - 2

...

y_r = x_r - (r-1)

This way we have that y_1 < y_2 < y_3 ... without the non-consecutive constraint!

This basically amounts to choosing r elements out of n-r+1. Thus all I need to do is run the generation for (n-r+1 choose r).

For our purposes, using the mapping after things are generated is good enough.

Reasons for choosing svkcr's answer

All great answers, but I have chosen svkcr's answer.

Here are some reasons why

It is effectively stateless (or "Markovian" to be more precise). The next permutation can be generated from the previous one. It is in a way almost optimal: O(r) space and time.
It is predictable. We know exactly the order(lexicographic) in which the combinations are generated.

These two properties make it easy to parallelise the generation (split at predictable points and delegate), with fault tolerance thrown in (can pick off from the last generated combination if a CPU/machine fails)!

Sorry, parallelisation was not mentioned earlier, because it didn't occur to me when I wrote the question and I got that idea only later.

Isn't generating and filtering going to be O(n)? Or, actually, O(r)? There's only one illegal value at each slot from 1 to r, so at most (r-1) combinations to skip. — abarnert
PS, "generating all combinations(as an iterator)" is a one-liner with itertools. — abarnert
@abarnert: It could potentially be Omega(nr) (or worse) can't it? Thanks for tip about itertools. — Knoothe
How could it be nr? One of us needs to think this through in detail. But keep in mind that you can assume the combinations arrive in sorted order. — abarnert
@abarnert: The underlying generator could be Omega(n), and so if you skip r calls to next, you have Omega(nr). You are right, though, more thought needs to be given (by me). — Knoothe

svckr svckr · Accepted Answer · 2013-03-15T01:09:17

This is fun! How about this:

def nonconsecutive_combinations(r, n):
  # first combination, startng at 1, step-size 2
  combination = range(1, r*2, 2)
  # as long as all items are less than or equal to n
  while combination[r-1] <= n:
    yield tuple(combination)
    p = r-1 # pointer to the element we will increment
    a = 1   # number that will be added to the last element
    # find the rightmost element we can increment without
    # making the last element bigger than n
    while p > 0 and combination[p] + a > n:
      p -= 1
      a += 2
    # increment the item and
    # fill the tail of the list with increments of two
    combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)

Each next() call should have an O(r) .. I got the idea while thinking about how this would translate to natural numbers, but it took quite some time to get the details right.

> list(nonconsecutive_combinations(2, 4))
[(1, 3), (1, 4), (2, 4)]
> list(nonconsecutive_combinations(3, 6))
[(1, 3, 5), (1, 3, 6), (1, 4, 6), (2, 4, 6)]

Let me try to explain how this works.

Conditions for a tuple c with r elements to be part of the result set:

Any element of the tuple is at least as large as the preceding element plus 2. (c[x] >= c[x-1] + 2)
All elements are less than, or equal to n. Because of 1. it is sufficient to say that the last element is less than or equal to n. (c[r-1] <= n)

The smallest tuple that may be part of the result set is (1, 3, 5, ..., 2*r-1). When I say a tuple is "smaller" than another, I am assuming the lexicographical order.

As Blckknght is pointing out, even the smallest possible tuple may be to large to satisfy condition 2.

The function above contains two while loops:

The outer loop steps through the results and assumes they appear in lexicographical order and satisfy condition one. As soon as the tuple in question violates condition two, we know that we have exhausted the result set and are done:
```
combination = range(1, r*2, 2)
while combination[r-1] <= n:
```
The first line initializes the result-tuple with the first possible result according to condition one. Line two squarely translates to condition two.
The inner loop finds the next possible tuple satisfying condition one.
```
yield tuple(combination)
```
Since the while condition (2) is true and we made sure the result satisfies condition one we can yield the current result-tuple.

Next, to find the lexicographically next tuple, we would add "1" to the last element.
```
# we don't actually do this:
combination[r-1] += 1
```
However, that may break condition 2 too early. So, if that operation would break condition 2, we increment preceding element and adjust the last element accordingly. This is a little like counting integers base 10: "If the last digit is larger than 9, increment the previous digit and make the last digit a 0." But instead of adding zeros, we fill the tuple so that condition 1 is true.
```
# if above does not work
combination[r-2] += 1
combination[r-1]  = combination[r-2] + 2
```
Problem is, the second line may break condition two yet again. So what we actually do is, we keep track of the last element and that is what is done with the a. Also we use the p variable to refer to the index current element we are looking at.
```
p = r-1
a = 1
while p > 0 and combination[p] + a > n:
  p -= 1
  a += 2
```
We are iterating right-to-left (p = r-1, p -= 1) through the items of the result tuple. Initially we want to add one to the last item (a = 1) but when stepping through the tuple we actually want to replace the last item with the value of a preceding item plus 2*x, where x is the distance between the two items. (a += 2, combination[p] + a)

Finally, we have found the item we want to increment, and fill the rest of the tuple with a sequence starting at the incremented item, with a step size of 2:
```
combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)
```
And that's it. It seemed so easy when I first thought of it, but all the arithmetic throughout the function make a great place for off-by-one errors and describing it is harder than it should be. I should have known I'm in trouble when I added that inner loop :)

On performance ..

Unfortunately while loops filled with arithmetic are not the most efficient thing to write in Python. The other solutions accept that reality and use list comprehensions or filtering to push the heavy lifting down into the Python runtime. This seems to me to be the right thing to do.

On the other hand, I'm quite certain that my solution would perform a lot better than most if this were C. The inner while loop is O(log r) and it mutates the result in-place and the same O(log r). It does not consume additional stack frames and does not consume any memory besides the result and two variables. But obviously this is not C, so none of this really matters.

Generating non-consecutive combinations

5 Answers

Let me try to explain how this works.

On performance ..