Generate itertools.product in different order

Question

I have some sorted/scored lists of parameters. I'd like to generate possible combinations of parameters (cartesian product). However, if the number of parameters is large, this quickly (very quickly!!) becomes a very large number. Basically, I'd like to do a cartesian product, but stop early.

import itertools
parameter_options = ['1234',
                     '123',
                     '1234']
for parameter_set in itertools.product(*parameter_options):
    print ''.join(parameter_set)

generates:

I'd like to generate (or something similar):

So that if I stop early, I'd at least get a couple of "good" sets of parameters, where a good set of parameters comes mostly early from the lists. This particular order would be fine, but I am interested in any technique that changes the "next permutation" choice order. I'd like the early results generated to have most items from the front of the list, but don't really care whether a solution generates 113 or 122 first, or whether 211 or 112 comes first.

My plan is to stop after some number of permutations are generated (maybe 10K or so? Depends on results). So if there are fewer than the cutoff, all should be generated, ultimately. And preferably each generated only once.

What exactly do you want here? That particular order, or just some other order? — miradulo
That particular order would be fine, but I am interested in any technique that changes the "next permutation" choice order. I'd like the early results generated to have most items from the front of the list, but don't really care whether a solution generates 113 or 122 first, or whether 211 or 112 comes first. — dbn

Blckknght Blckknght · Accepted Answer · 2017-02-17T03:25:22

I think you can get your results in the order you want if you think of the output in terms of a graph traversal of the output space. You want a nearest-first traversal, while the itertools.product function is a depth-first traversal.

Try something like this:

import heapq

def nearest_first_product(*sequences):
    start = (0,)*len(sequences)
    queue = [(0, start)]
    seen = set([start])
    while queue:
        priority, indexes = heapq.heappop(queue)
        yield tuple(seq[index] for seq, index in zip(sequences, indexes))
        for i in range(len(sequences)):
            if indexes[i] < len(sequences[i]) - 1:
                lst = list(indexes)
                lst[i] += 1
                new_indexes = tuple(lst)
                if new_indexes not in seen:
                    new_priority = sum(index * index for index in new_indexes)
                    heapq.heappush(queue, (new_priority, new_indexes))
                    seen.add(new_indexes)

Example output:

for tup in nearest_first_product(range(1, 5), range(1, 4), range(1, 5)):
    print(tup)

(1, 1, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(1, 2, 2)
(2, 1, 2)
(2, 2, 1)
(2, 2, 2)
(1, 1, 3)
(1, 3, 1)
(3, 1, 1)
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)
(2, 2, 3)
(2, 3, 2)
(3, 2, 2)
(1, 3, 3)
(3, 1, 3)
(3, 3, 1)
(1, 1, 4)
(2, 3, 3)
(3, 2, 3)
(3, 3, 2)
(4, 1, 1)
(1, 2, 4)
(2, 1, 4)
(4, 1, 2)
(4, 2, 1)
(2, 2, 4)
(4, 2, 2)
(3, 3, 3)
(1, 3, 4)
(3, 1, 4)
(4, 1, 3)
(4, 3, 1)
(2, 3, 4)
(3, 2, 4)
(4, 2, 3)
(4, 3, 2)
(3, 3, 4)
(4, 3, 3)
(4, 1, 4)
(4, 2, 4)
(4, 3, 4)

You can get a bunch of slightly different orders by changing up the calculation of new_priority in the code. The current version uses squared Cartesian distance as the priorities, but you could use some other value if you wanted to (for instance, one that incorporates the values from the sequences, not only the indexes).

If you don't care too much about whether (1, 1, 3) comes before (1, 2, 2) (so long as they both come after (1, 1, 2), (1, 2, 1) and (2, 1, 1)), you could probably do a breadth-first traversal instead of nearest-first. This would be a bit simpler, as you could use a regular queue (like a collections.deque) rather than a priority queue.

The queues used by this sort of graph traversal mean that this code uses some amount of memory. However, the amount of memory is a lot less than if you had to produce the results all up front before putting them in order. The maximum memory used is proportional to the surface area of the result space, rather than its volume.

Generate itertools.product in different order

3 Answers