Efficient random permutation of n-set-bits

Question

For the problem of producing a bit-pattern with exactly n set bits, I know of two practical methods, but they both have limitations I'm not happy with.

First, you can enumerate all of the possible word values which have that many bits set in a pre-computed table, and then generate a random index into that table to pick out a possible result. This has the problem that as the output size grows the list of candidate outputs eventually becomes impractically large.

Alternatively, you can pick n non-overlapping bit positions at random (for example, by using a partial Fisher-Yates shuffle) and set those bits only. This approach, however, computes a random state in a much larger space than the number of possible results. For example, it may choose the first and second bits out of three, or it might, separately, choose the second and first bits.

This second approach must consume more bits from the random number source than are strictly required. Since it is choosing n bits in a specific order when their order is unimportant, this means that it is making an arbitrary distinction between n! different ways of producing the same result, and consuming at least floor(log_2(n!)) more bits than are necessary.

Can this be avoided?

There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.

clarification

The first approach requires picking a single random number between zero and $w!\over n!(w-n)!$ (where w is the output size), as this is the number of possible solutions.

The second approach requires picking n random values between zero and w-1, zero and w-2, etc., and these have a product of $w!\over (w-n)!$ , which is $n!$ times larger than the first approach.

This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits.

I am not sure I see your problem with the Fisher-Yates algorithm. If you want, say, five set bits then start with 11111000 ... 000 and do a partial Fisher-Yates shuffle for the first five positions. — rossum
@rossum, The problem with Fisher-Yates is that it consumes more than its fair share of entropy to choose a solution. I'm looking for a method which can solve the problem using only the bare minimum of random input to cover the entire solution space. Since the number of solutions is w! / ((w-n)! * n!) (where w is the total size of the output), an ideal algorithm would consider only that many possibilities, whereas Fisher-Yates considers w! / (w-n)! possibilities separately. — sh1
You only need enough raw entropy to seed a PRNG. Use the PRNG to do the actual shuffle, not the raw entropy. For the next shuffle reseed the PRNG with some more raw entropy. That spreads out the raw entropy and makes less of a demand on your entropy source. — rossum
@rossum: That's another approach. You should post it as an answer. — tmyklebu
Still mulling this over from 30000 feet, but it seems to me that the binomial-based iterations are essentially choosing a random number of bits to skip between set bits; only, the distribution of the results is not uniform. On top of thinking about culling common factors from that operation (and saving those for subsequent bits), I just noticed the paper linked from this answer to an almost-unrelated problem. I'm wondering if there's an angle, there. — sh1

Lee Daniel Crocker Lee Daniel Crocker · Accepted Answer · 2013-06-09T18:45:15

Seems like you want a variant of Floyd's algorithm:

Algorithm to select a single, random combination of values?

Should be especially useful in your case, because the containment test is a simple bitmask operation. This will require only k calls to the RNG. In the code below, I assume you have randint(limit) which produces a uniform random from 0 to limit-1, and that you want k bits set in a 32-bit int:

mask = 0;
for (j = 32 - k; j < 32; ++j) {
    r = randint(j+1);
    b = 1 << r;
    if (mask & b) mask |= (1 << j);
    else mask |= b;
}

How many bits of entropy you need here depends on how randint() is implemented. If k > 16, set it to 32 - k and negate the result.

Your alternative suggestion of generating a single random number representing one combination among the set (mathematicians would call this a rank of the combination) is simpler if you use colex order rather than lexicographic rank. This code, for example:

for (i = k; i >= 1; --i) {
    while ((b = binomial(n, i)) > r) --n;
    buf[i-1] = n;
    r -= b;
}

will fill the array buf[] with indices from 0 to n-1 for the k-combination at colex rank r. In your case, you'd replace buf[i-1] = n with mask |= (1 << n). The binomial() function is binomial coefficient, which I do with a lookup table (see this). That would make the most efficient use of entropy, but I still think Floyd's algorithm would be a better compromise.

Efficient random permutation of n-set-bits

5 Answers