3
votes

I have some sorted/scored lists of parameters. I'd like to generate possible combinations of parameters (cartesian product). However, if the number of parameters is large, this quickly (very quickly!!) becomes a very large number. Basically, I'd like to do a cartesian product, but stop early.

import itertools
parameter_options = ['1234',
                     '123',
                     '1234']
for parameter_set in itertools.product(*parameter_options):
    print ''.join(parameter_set)

generates:

111
112
113
114
121
122
123
124
131
132
133
134
...

I'd like to generate (or something similar):

111
112
121
211
122
212
221
222
...

So that if I stop early, I'd at least get a couple of "good" sets of parameters, where a good set of parameters comes mostly early from the lists. This particular order would be fine, but I am interested in any technique that changes the "next permutation" choice order. I'd like the early results generated to have most items from the front of the list, but don't really care whether a solution generates 113 or 122 first, or whether 211 or 112 comes first.

My plan is to stop after some number of permutations are generated (maybe 10K or so? Depends on results). So if there are fewer than the cutoff, all should be generated, ultimately. And preferably each generated only once.

3
What exactly do you want here? That particular order, or just some other order?miradulo
That particular order would be fine, but I am interested in any technique that changes the "next permutation" choice order. I'd like the early results generated to have most items from the front of the list, but don't really care whether a solution generates 113 or 122 first, or whether 211 or 112 comes first.dbn
What have you done eventually ?keepAlive

3 Answers

3
votes

I think you can get your results in the order you want if you think of the output in terms of a graph traversal of the output space. You want a nearest-first traversal, while the itertools.product function is a depth-first traversal.

Try something like this:

import heapq

def nearest_first_product(*sequences):
    start = (0,)*len(sequences)
    queue = [(0, start)]
    seen = set([start])
    while queue:
        priority, indexes = heapq.heappop(queue)
        yield tuple(seq[index] for seq, index in zip(sequences, indexes))
        for i in range(len(sequences)):
            if indexes[i] < len(sequences[i]) - 1:
                lst = list(indexes)
                lst[i] += 1
                new_indexes = tuple(lst)
                if new_indexes not in seen:
                    new_priority = sum(index * index for index in new_indexes)
                    heapq.heappush(queue, (new_priority, new_indexes))
                    seen.add(new_indexes)

Example output:

for tup in nearest_first_product(range(1, 5), range(1, 4), range(1, 5)):
    print(tup)

(1, 1, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(1, 2, 2)
(2, 1, 2)
(2, 2, 1)
(2, 2, 2)
(1, 1, 3)
(1, 3, 1)
(3, 1, 1)
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)
(2, 2, 3)
(2, 3, 2)
(3, 2, 2)
(1, 3, 3)
(3, 1, 3)
(3, 3, 1)
(1, 1, 4)
(2, 3, 3)
(3, 2, 3)
(3, 3, 2)
(4, 1, 1)
(1, 2, 4)
(2, 1, 4)
(4, 1, 2)
(4, 2, 1)
(2, 2, 4)
(4, 2, 2)
(3, 3, 3)
(1, 3, 4)
(3, 1, 4)
(4, 1, 3)
(4, 3, 1)
(2, 3, 4)
(3, 2, 4)
(4, 2, 3)
(4, 3, 2)
(3, 3, 4)
(4, 3, 3)
(4, 1, 4)
(4, 2, 4)
(4, 3, 4)

You can get a bunch of slightly different orders by changing up the calculation of new_priority in the code. The current version uses squared Cartesian distance as the priorities, but you could use some other value if you wanted to (for instance, one that incorporates the values from the sequences, not only the indexes).

If you don't care too much about whether (1, 1, 3) comes before (1, 2, 2) (so long as they both come after (1, 1, 2), (1, 2, 1) and (2, 1, 1)), you could probably do a breadth-first traversal instead of nearest-first. This would be a bit simpler, as you could use a regular queue (like a collections.deque) rather than a priority queue.

The queues used by this sort of graph traversal mean that this code uses some amount of memory. However, the amount of memory is a lot less than if you had to produce the results all up front before putting them in order. The maximum memory used is proportional to the surface area of the result space, rather than its volume.

3
votes

Your question is a bit ambigous, but reading your comments and another answers, it seems you want a cartesian product implementation that does a breadth search instead of a depth search.

Recently I had your same need, but also with the requirement that it doesn't store intermediate results in memory. This is very important to me because I am working with large number of parameters (thus a extremely big cartesian product) and any implementation that stores values or do recursive calls is non-viable. As you state in your question, this seems to be your case also.

As I didn't find an answer that fulfils this requirement, I came to this solution:

def product(*sequences):
    '''Breadth First Search Cartesian Product'''
    # sequences = tuple(tuple(seq) for seqin sequences)

    def partitions(n, k):
        for c in combinations(range(n+k-1), k-1):
            yield (b-a-1 for a, b in zip((-1,)+c, c+(n+k-1,)))

    max_position = [len(i)-1 for i in sequences]
    for i in range(sum(max_position)):
        for positions in partitions(i, len(sequences)):
            try:
                yield tuple(map(lambda seq, pos: seq[pos], sequences, positions))
            except IndexError:
                continue
    yield tuple(map(lambda seq, pos: seq[pos], sequences, max_position))

In terms of speed, this generator works fine in the beginning but starts getting slower in the latest results. So, although this implementation is a bit slower it works as a generator that doesn't use memory and doesn't give repeated values.

As I mentioned in @Blckknght answer, parameters here also must be sequences (subscriptable and length-defined iterables). But you can also bypass this limitation (sacrificing a bit of memory) by uncommenting the first line. This may be useful if you are working with generators/iterators as parameters.

I hope I've helped you and let me know if this helps to your problem.

0
votes

This solution possibly isn't the best as it forces every combination into memory briefly, but it does work. It just might take a little while for large data sets.

import itertools
import random
count = 100  # the (maximum) amount of results
results = random.sample(list(itertools.product(*parameter_options)), count)

for parameter_set in results:
    print "".join(parameter_set)

This will give you a list of products in a random order.