Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
- What is a simple solution?
- What is an effective solution to reduce memory usage or run on a slower CPU?
This is equivalent to Adam Rosenfield's solution, but may be a bit more clear for some readers. It assumes rand5() is a function that returns a statistically random integer in the range 1 through 5 inclusive.
int rand7()
{
int vals[5][5] = {
{ 1, 2, 3, 4, 5 },
{ 6, 7, 1, 2, 3 },
{ 4, 5, 6, 7, 1 },
{ 2, 3, 4, 5, 6 },
{ 7, 0, 0, 0, 0 }
};
int result = 0;
while (result == 0)
{
int i = rand5();
int j = rand5();
result = vals[i-1][j-1];
}
return result;
}
How does it work? Think of it like this: imagine printing out this double-dimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it's a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero. That's what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don't get a good result, we keep throwing darts.
Like Adam said, this can run forever in the worst case, but statistically the worst case never happens. :)
There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:
int i;
do
{
i = 5 * (rand5() - 1) + rand5(); // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1; // result is now uniformly random between 1 and 7
This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.
I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5()
per call to rand7()
, to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.
The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5()
has approximately 2.32193 bits of entropy, and rand7()
has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5()
, and apply them to generating 2.80735 bits of entropy needed for each call to rand7()
. The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5()
per call to rand7()
.
Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5()
will be assumed to return numbers in the range [0, 4], and rand7()
will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.
So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a
1a
2a
3..., where each digit ai
is chosen by a call to rand5()
. For example, if our RNG chose ai
= 1 for all i
, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).
Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a
, b
] equals the length of that interval, b - a
. The proof is left as an exercise for the reader =).
Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7()
. How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7()
.
Taking the example from earlier, if our rand5()
produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.
Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?
One way we can convert a number between 0 and 1 to base 7 is as follows:
To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5()
twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5()
produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.
So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k
digits, then we can safely output the next k
digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k
digits of the base 7 representation!
And that's the algorithm -- to generate the next output of rand7()
, we generate only as many digits of rand5()
as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:
import random
rand5_calls = 0
def rand5():
global rand5_calls
rand5_calls += 1
return random.randint(0, 4)
def rand7_gen():
state = 0
pow5 = 1
pow7 = 7
while True:
if state / pow5 == (state + pow7) / pow5:
result = state / pow5
state = (state - result * pow5) * 7
pow7 *= 7
yield result
else:
state = 5 * state + pow7 * rand5()
pow5 *= 5
if __name__ == '__main__':
r7 = rand7_gen()
N = 10000
x = list(next(r7) for i in range(N))
distr = [x.count(i) for i in range(7)]
expmean = N / 7.0
expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))
print '%d TRIALS' % N
print 'Expected mean: %.1f' % expmean
print 'Expected standard deviation: %.1f' % expstddev
print
print 'DISTRIBUTION:'
for i in range(7):
print '%d: %d (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
print
print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)
Note that rand7_gen()
returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7)
10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.
Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).
In one run of this, I made 12091 calls to rand5()
for 10000 calls to rand7()
, achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.
In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5
and pow7
to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5()
per call to rand7()
very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.
(I have stolen Adam Rosenfeld's answer and made it run about 7% faster.)
Assume that rand5() returns one of {0,1,2,3,4} with equal distribution and the goal is return {0,1,2,3,4,5,6} with equal distribution.
int rand7() {
i = 5 * rand5() + rand5();
max = 25;
//i is uniform among {0 ... max-1}
while(i < max%7) {
//i is uniform among {0 ... (max%7 - 1)}
i *= 5;
i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)}
max %= 7;
max *= 5; //once again, i is uniform among {0 ... max-1}
}
return(i%7);
}
We're keeping track of the largest value that the loop can make in the variable max
. If the reult so far is between max%7 and max-1 then the result will be uniformly distrubuted in that range. If not, we use the remainder, which is random between 0 and max%7-1, and another call to rand() to make a new number and a new max. Then we start again.
Edit: Expect number of times to call rand5() is x in this equation:
x = 2 * 21/25
+ 3 * 4/25 * 14/20
+ 4 * 4/25 * 6/20 * 28/30
+ 5 * 4/25 * 6/20 * 2/30 * 7/10
+ 6 * 4/25 * 6/20 * 2/30 * 3/10 * 14/15
+ (6+x) * 4/25 * 6/20 * 2/30 * 3/10 * 1/15
x = about 2.21 calls to rand5()
Algorithm:
7 can be represented in a sequence of 3 bits
Use rand(5) to randomly fill each bit with 0 or 1.
For e.g: call rand(5) and
if the result is 1 or 2, fill the bit with 0
if the result is 4 or 5, fill the bit with 1
if the result is 3 , then ignore and do it again (rejection)
This way we can fill 3 bits randomly with 0/1 and thus get a number from 1-7.
EDIT: This seems like the simplest and most efficient answer, so here's some code for it:
public static int random_7() {
int returnValue = 0;
while (returnValue == 0) {
for (int i = 1; i <= 3; i++) {
returnValue = (returnValue << 1) + random_5_output_2();
}
}
return returnValue;
}
private static int random_5_output_2() {
while (true) {
int flip = random_5();
if (flip < 3) {
return 0;
}
else if (flip > 3) {
return 1;
}
}
}
rand7() = (rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5())%7+1
Edit: That doesn't quite work. It's off by about 2 parts in 1000 (assuming a perfect rand5). The buckets get:
value Count Error%
1 11158 -0.0035
2 11144 -0.0214
3 11144 -0.0214
4 11158 -0.0035
5 11172 +0.0144
6 11177 +0.0208
7 11172 +0.0144
By switching to a sum of
n Error%
10 +/- 1e-3,
12 +/- 1e-4,
14 +/- 1e-5,
16 +/- 1e-6,
...
28 +/- 3e-11
seems to gain an order of magnitude for every 2 added
BTW: the table of errors above was not generated via sampling but by the following recurrence relation:
p[x,n]
is the number waysoutput=x
can happen givenn
calls torand5
.
p[1,1] ... p[5,1] = 1
p[6,1] ... p[7,1] = 0
p[1,n] = p[7,n-1] + p[6,n-1] + p[5,n-1] + p[4,n-1] + p[3,n-1]
p[2,n] = p[1,n-1] + p[7,n-1] + p[6,n-1] + p[5,n-1] + p[4,n-1]
p[3,n] = p[2,n-1] + p[1,n-1] + p[7,n-1] + p[6,n-1] + p[5,n-1]
p[4,n] = p[3,n-1] + p[2,n-1] + p[1,n-1] + p[7,n-1] + p[6,n-1]
p[5,n] = p[4,n-1] + p[3,n-1] + p[2,n-1] + p[1,n-1] + p[7,n-1]
p[6,n] = p[5,n-1] + p[4,n-1] + p[3,n-1] + p[2,n-1] + p[1,n-1]
p[7,n] = p[6,n-1] + p[5,n-1] + p[4,n-1] + p[3,n-1] + p[2,n-1]
The following produces a uniform distribution on {1, 2, 3, 4, 5, 6, 7} using a random number generator producing a uniform distribution on {1, 2, 3, 4, 5}. The code is messy, but the logic is clear.
public static int random_7(Random rg) {
int returnValue = 0;
while (returnValue == 0) {
for (int i = 1; i <= 3; i++) {
returnValue = (returnValue << 1) + SimulateFairCoin(rg);
}
}
return returnValue;
}
private static int SimulateFairCoin(Random rg) {
while (true) {
int flipOne = random_5_mod_2(rg);
int flipTwo = random_5_mod_2(rg);
if (flipOne == 0 && flipTwo == 1) {
return 0;
}
else if (flipOne == 1 && flipTwo == 0) {
return 1;
}
}
}
private static int random_5_mod_2(Random rg) {
return random_5(rg) % 2;
}
private static int random_5(Random rg) {
return rg.Next(5) + 1;
}
int rand7() {
int value = rand5()
+ rand5() * 2
+ rand5() * 3
+ rand5() * 4
+ rand5() * 5
+ rand5() * 6;
return value%7;
}
Unlike the chosen solution, the algorithm will run in constant time. It does however make 2 more calls to rand5 than the average run time of the chosen solution.
Note that this generator is not perfect (the number 0 has 0.0064% more chance than any other number), but for most practical purposes the guarantee of constant time probably outweighs this inaccuracy.
Explanation
This solution is derived from the fact that the number 15,624 is divisible by 7 and thus if we can randomly and uniformly generate numbers from 0 to 15,624 and then take mod 7 we can get a near-uniform rand7 generator. Numbers from 0 to 15,624 can be uniformly generated by rolling rand5 6 times and using them to form the digits of a base 5 number as follows:
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
Properties of mod 7 however allow us to simplify the equation a bit:
5^5 = 3 mod 7
5^4 = 2 mod 7
5^3 = 6 mod 7
5^2 = 4 mod 7
5^1 = 5 mod 7
So
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
becomes
rand5 * 3 + rand5 * 2 + rand5 * 6 + rand5 * 4 + rand5 * 5 + rand5
Theory
The number 15,624 was not chosen randomly, but can be discovered using fermat's little theorem, which states that if p is a prime number then
a^(p-1) = 1 mod p
So this gives us,
(5^6)-1 = 0 mod 7
(5^6)-1 is equal to
4 * 5^5 + 4 * 5^4 + 4 * 5^3 + 4 * 5^2 + 4 * 5 + 4
This is a number in base 5 form and thus we can see that this method can be used to go from any random number generator to any other random number generator. Though a small bias towards 0 is always introduced when using the exponent p-1.
To generalize this approach and to be more accurate we can have a function like this:
def getRandomconverted(frm, to):
s = 0
for i in range(to):
s += getRandomUniform(frm)*frm**i
mx = 0
for i in range(to):
mx = (to-1)*frm**i
mx = int(mx/to)*to # maximum value till which we can take mod
if s < mx:
return s%to
else:
return getRandomconverted(frm, to)
If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I
, of uniformly distributed integers of length m
from 1-5 outputs a stream O
, of uniformly distributed integers from 1-7 of the longest length relative to m
, say L(m)
.
The simplest way to analyse this is to treat the streams I and O
as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+..
and similarly for stream O
.
Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c
where c>0
and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1
to 5^m
and another uniform map from integers from 1 to 7^n
to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n
.
So this gives a value for L(m)
of around m (log5/log7)
which is approximately .82m
.
The difficulty with the above analysis is the equation 5^m-7^n=c
which is not easy to solve exactly and the case where the uniform value from 1
to 5^m
exceeds 7^n
and we lose efficiency.
The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?
If 5^m-7^n=c
then from the input stream we effectively generate a uniform random number from 0
to (5^m)-1
and don't use any values higher than 7^n
. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n
. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.
If we let T7(X)
to be the average length of the output sequence of random(1-7)
integers derived from a uniform input of size X
, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7
.
Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0)
since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0
with probability (5^m-7^n0)/5^m)
.
If we just keep substituting we obtain:
T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m
Hence
L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)
Another way of putting this is:
If 5^m has 7-ary representation `a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r
Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)
The best possible case is my original one above where 5^m=7^n+s
, where s<7
.
Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1)
as before.
The worst case is when we can only find k and s.t 5^m = kx7+s.
Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1)
Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:
T7(5^m) = m (Log5/Log7)+e(m)
It seems impossible to achieve e(m) = o(1)
in general but hopefully we can prove e(m)=o(m)
.
The whole thing then rests on the distribution of the 7-ary digits of 5^m
for various values of m
.
I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.
Here is a working Python implementation of Adam's answer.
import random
def rand5():
return random.randint(1, 5)
def rand7():
while True:
r = 5 * (rand5() - 1) + rand5()
#r is now uniformly random between 1 and 25
if (r <= 21):
break
#result is now uniformly random between 1 and 7
return r % 7 + 1
I like to throw algorithms I'm looking at into Python so I can play around with them, thought I'd post it here in the hopes that it is useful to someone out there, not that it took long to throw together.
Assuming that rand(n) here means "random integer in a uniform distribution from 0 to n-1", here's a code sample using Python's randint, which has that effect. It uses only randint(5), and constants, to produce the effect of randint(7). A little silly, actually
from random import randint
sum = 7
while sum >= 7:
first = randint(0,5)
toadd = 9999
while toadd>1:
toadd = randint(0,5)
if toadd:
sum = first+5
else:
sum = first
assert 7>sum>=0
print sum
The premise behind Adam Rosenfield's correct answer is:
When n equals 2, you have 4 throw-away possibilities: y = {22, 23, 24, 25}. If you use n equals 6, you only have 1 throw-away: y = {15625}.
5^6 = 15625
7 * 2232 = 15624
You call rand5 more times. However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.
Here's my answer:
static struct rand_buffer {
unsigned v, count;
} buf2, buf3;
void push (struct rand_buffer *buf, unsigned n, unsigned v)
{
buf->v = buf->v * n + v;
++buf->count;
}
#define PUSH(n, v) push (&buf##n, n, v)
int rand16 (void)
{
int v = buf2.v & 0xf;
buf2.v >>= 4;
buf2.count -= 4;
return v;
}
int rand9 (void)
{
int v = buf3.v % 9;
buf3.v /= 9;
buf3.count -= 2;
return v;
}
int rand7 (void)
{
if (buf3.count >= 2) {
int v = rand9 ();
if (v < 7)
return v % 7 + 1;
PUSH (2, v - 7);
}
for (;;) {
if (buf2.count >= 4) {
int v = rand16 ();
if (v < 14) {
PUSH (2, v / 7);
return v % 7 + 1;
}
PUSH (2, v - 14);
}
// Get a number between 0 & 25
int v = 5 * (rand5 () - 1) + rand5 () - 1;
if (v < 21) {
PUSH (3, v / 7);
return v % 7 + 1;
}
v -= 21;
PUSH (2, v & 1);
PUSH (2, v >> 1);
}
}
It's a little more complicated than others, but I believe it minimises the calls to rand5. As with other solutions, there's a small probability that it could loop for a long time.
Simple and efficient:
int rand7 ( void )
{
return 4; // this number has been calculated using
// rand5() and is in the range 1..7
}
(Inspired by What's your favorite "programmer" cartoon?).
As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:
$num = 0;
$possibilities = 1;
sub rand7
{
while( $possibilities < 7 )
{
$num = $num * 5 + int(rand(5));
$possibilities *= 5;
}
my $result = $num % 7;
$num = int( $num / 7 );
$possibilities /= 7;
return $result;
}
I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.
Perhaps Adam Rosenfield would be kind enough to comment?
My (naive?) idea is this:
Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.
To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:
(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7
The rand7() function follows:
(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)
int rand7(){
static int a=0;
static int e=0;
int r;
a = a * 5 + rand5();
e = e + 5; // added 5/7ths of a rand7 number
if ( e<7 ){
a = a * 5 + rand5();
e = e + 5; // another 5/7ths
}
r = a % 7;
e = e - 7; // removed a rand7 number
a = a % 7;
return r;
}
Edit: Added results for 100 million trials.
'Real' rand functions mod 5 or 7
rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046
My rand7
Average looks ok and number distributions look ok too.
randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943
There are elegant algorithms cited above, but here's one way to approach it, although it might be roundabout. I am assuming values generated from 0.
R2 = random number generator giving values less than 2 (sample space = {0, 1})
R8 = random number generator giving values less than 8 (sample space = {0, 1, 2, 3, 4, 5, 6, 7})
In order to generate R8 from R2, you will run R2 thrice, and use the combined result of all 3 runs as a binary number with 3 digits. Here are the range of values when R2 is ran thrice:
0 0 0 --> 0
.
.
1 1 1 --> 7
Now to generate R7 from R8, we simply run R7 again if it returns 7:
int R7() {
do {
x = R8();
} while (x > 6)
return x;
}
The roundabout solution is to generate R2 from R5 (just like we generated R7 from R8), then R8 from R2 and then R7 from R8.
Here's a solution that fits entirely within integers and is within about 4% of optimal (i.e. uses 1.26 random numbers in {0..4} for every one in {0..6}). The code's in Scala, but the math should be reasonably clear in any language: you take advantage of the fact that 7^9 + 7^8 is very close to 5^11. So you pick an 11 digit number in base 5, and then interpret it as a 9 digit number in base 7 if it's in range (giving 9 base 7 numbers), or as an 8 digit number if it's over the 9 digit number, etc.:
abstract class RNG {
def apply(): Int
}
class Random5 extends RNG {
val rng = new scala.util.Random
var count = 0
def apply() = { count += 1 ; rng.nextInt(5) }
}
class FiveSevener(five: RNG) {
val sevens = new Array[Int](9)
var nsevens = 0
val to9 = 40353607;
val to8 = 5764801;
val to7 = 823543;
def loadSevens(value: Int, count: Int) {
nsevens = 0;
var remaining = value;
while (nsevens < count) {
sevens(nsevens) = remaining % 7
remaining /= 7
nsevens += 1
}
}
def loadSevens {
var fivepow11 = 0;
var i=0
while (i<11) { i+=1 ; fivepow11 = five() + fivepow11*5 }
if (fivepow11 < to9) { loadSevens(fivepow11 , 9) ; return }
fivepow11 -= to9
if (fivepow11 < to8) { loadSevens(fivepow11 , 8) ; return }
fivepow11 -= to8
if (fivepow11 < 3*to7) loadSevens(fivepow11 % to7 , 7)
else loadSevens
}
def apply() = {
if (nsevens==0) loadSevens
nsevens -= 1
sevens(nsevens)
}
}
If you paste a test into the interpreter (REPL actually), you get:
scala> val five = new Random5
five: Random5 = Random5@e9c592
scala> val seven = new FiveSevener(five)
seven: FiveSevener = FiveSevener@143c423
scala> val counts = new Array[Int](7)
counts: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0)
scala> var i=0 ; while (i < 100000000) { counts( seven() ) += 1 ; i += 1 }
i: Int = 100000000
scala> counts
res0: Array[Int] = Array(14280662, 14293012, 14281286, 14284836, 14287188,
14289332, 14283684)
scala> five.count
res1: Int = 125902876
The distribution is nice and flat (within about 10k of 1/7 of 10^8 in each bin, as expected from an approximately-Gaussian distribution).
By using a rolling total, you can both
Both these problems are an issue with the simplistic rand(5)+rand(5)...
-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).
import random
x = []
for i in range (0,7):
x.append (0)
t = 0
tt = 0
for i in range (0,700000):
########################################
##### qq.py #####
r = int (random.random () * 5)
t = (t + r) % 7
########################################
##### qq_notsogood.py #####
#r = 20
#while r > 6:
#r = int (random.random () * 5)
#r = r + int (random.random () * 5)
#t = r
########################################
x[t] = x[t] + 1
tt = tt + 1
high = x[0]
low = x[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt)
if x[i] < low:
low = x[i]
if x[i] > high:
high = x[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)
And this output shows the results:
pax$ python qq.py
0: 99908 14.27257
1: 100029 14.28986
2: 100327 14.33243
3: 100395 14.34214
4: 99104 14.15771
5: 99829 14.26129
6: 100408 14.34400
Variation = 1304 (0.18629%)
pax$ python qq.py
0: 99547 14.22100
1: 100229 14.31843
2: 100078 14.29686
3: 99451 14.20729
4: 100284 14.32629
5: 100038 14.29114
6: 100373 14.33900
Variation = 922 (0.13171%)
pax$ python qq.py
0: 100481 14.35443
1: 99188 14.16971
2: 100284 14.32629
3: 100222 14.31743
4: 99960 14.28000
5: 99426 14.20371
6: 100439 14.34843
Variation = 1293 (0.18471%)
A simplistic rand(5)+rand(5)
, ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:
pax$ python qq_notsogood.py
0: 31756 4.53657
1: 63304 9.04343
2: 95507 13.64386
3: 127825 18.26071
4: 158851 22.69300
5: 127567 18.22386
6: 95190 13.59857
Variation = 127095 (18.15643%)
pax$ python qq_notsogood.py
0: 31792 4.54171
1: 63637 9.09100
2: 95641 13.66300
3: 127627 18.23243
4: 158751 22.67871
5: 126782 18.11171
6: 95770 13.68143
Variation = 126959 (18.13700%)
pax$ python qq_notsogood.py
0: 31955 4.56500
1: 63485 9.06929
2: 94849 13.54986
3: 127737 18.24814
4: 159687 22.81243
5: 127391 18.19871
6: 94896 13.55657
Variation = 127732 (18.24743%)
And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the rand7...
stuff:
import random
# rand5() returns 0 through 4 inclusive.
def rand5():
return int (random.random () * 5)
# rand7() generator returns 0 through 6 inclusive (using rand5()).
def rand7():
rand7ret = 0
while True:
rand7ret = (rand7ret + rand5()) % 7
yield rand7ret
# Number of test runs.
count = 700000
# Work out distribution.
distrib = [0,0,0,0,0,0,0]
rgen =rand7()
for i in range (0,count):
r = rgen.next()
distrib[r] = distrib[r] + 1
# Print distributions and calculate variation.
high = distrib[0]
low = distrib[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count)
if distrib[i] < low:
low = distrib[i]
if distrib[i] > high:
high = distrib[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)
This answer is more an experiment in obtaining the most entropy possible from the Rand5 function. t is therefore somewhat unclear and almost certainly a lot slower than other implementations.
Assuming the uniform distribution from 0-4 and resulting uniform distribution from 0-6:
public class SevenFromFive
{
public SevenFromFive()
{
// this outputs a uniform ditribution but for some reason including it
// screws up the output distribution
// open question Why?
this.fifth = new ProbabilityCondensor(5, b => {});
this.eigth = new ProbabilityCondensor(8, AddEntropy);
}
private static Random r = new Random();
private static uint Rand5()
{
return (uint)r.Next(0,5);
}
private class ProbabilityCondensor
{
private readonly int samples;
private int counter;
private int store;
private readonly Action<bool> output;
public ProbabilityCondensor(int chanceOfTrueReciprocal,
Action<bool> output)
{
this.output = output;
this.samples = chanceOfTrueReciprocal - 1;
}
public void Add(bool bit)
{
this.counter++;
if (bit)
this.store++;
if (counter == samples)
{
bool? e;
if (store == 0)
e = false;
else if (store == 1)
e = true;
else
e = null;// discard for now
counter = 0;
store = 0;
if (e.HasValue)
output(e.Value);
}
}
}
ulong buffer = 0;
const ulong Mask = 7UL;
int bitsAvail = 0;
private readonly ProbabilityCondensor fifth;
private readonly ProbabilityCondensor eigth;
private void AddEntropy(bool bit)
{
buffer <<= 1;
if (bit)
buffer |= 1;
bitsAvail++;
}
private void AddTwoBitsEntropy(uint u)
{
buffer <<= 2;
buffer |= (u & 3UL);
bitsAvail += 2;
}
public uint Rand7()
{
uint selection;
do
{
while (bitsAvail < 3)
{
var x = Rand5();
if (x < 4)
{
// put the two low order bits straight in
AddTwoBitsEntropy(x);
fifth.Add(false);
}
else
{
fifth.Add(true);
}
}
// read 3 bits
selection = (uint)((buffer & Mask));
bitsAvail -= 3;
buffer >>= 3;
if (selection == 7)
eigth.Add(true);
else
eigth.Add(false);
}
while (selection == 7);
return selection;
}
}
The number of bits added to the buffer per call to Rand5 is currently 4/5 * 2 so 1.6. If the 1/5 probability value is included that increases by 0.05 so 1.65 but see the comment in the code where I have had to disable this.
Bits consumed by call to Rand7 = 3 + 1/8 * (3 + 1/8 * (3 + 1/8 * (...
This is 3 + 3/8 + 3/64 + 3/512 ... so approx 3.42
By extracting information from the sevens I reclaim 1/8*1/7 bits per call so about 0.018
This gives a net consumption 3.4 bits per call which means the ratio is 2.125 calls to Rand5 for every Rand7. The optimum should be 2.1.
I would imagine this approach is significantly slower than many of the other ones here unless the cost of the call to Rand5 is extremely expensive (say calling out to some external source of entropy).
in php
function rand1to7() {
do {
$output_value = 0;
for ($i = 0; $i < 28; $i++) {
$output_value += rand1to5();
}
while ($output_value != 140);
$output_value -= 12;
return floor($output_value / 16);
}
loops to produce a random number between 16 and 127, divides by sixteen to create a float between 1 and 7.9375, then rounds down to get an int between 1 and 7. if I am not mistaken, there is a 16/112 chance of getting any one of the 7 outcomes.
I think I have four answers, two giving exact solutions like that of @Adam Rosenfield but without the infinite loop problem, and other two with almost perfect solution but faster implementation than first one.
The best exact solution requires 7 calls to rand5
, but lets proceed in order to understand.
Strength of Adam's answer is that it gives a perfect uniform distribution, and there is very high probability (21/25) that only two calls to rand5() will be needed. However, worst case is infinite loop.
The first solution below also gives a perfect uniform distribution, but requires a total of 42 calls to rand5
. No infinite loops.
Here is an R implementation:
rand5 <- function() sample(1:5,1)
rand7 <- function() (sum(sapply(0:6, function(i) i + rand5() + rand5()*2 + rand5()*3 + rand5()*4 + rand5()*5 + rand5()*6)) %% 7) + 1
For people not familiar with R, here is a simplified version:
rand7 = function(){
r = 0
for(i in 0:6){
r = r + i + rand5() + rand5()*2 + rand5()*3 + rand5()*4 + rand5()*5 + rand5()*6
}
return r %% 7 + 1
}
The distribution of rand5
will be preserved. If we do the math, each of the 7 iterations of the loop has 5^6 possible combinations, thus total number of possible combinations are (7 * 5^6) %% 7 = 0
. Thus we can divide the random numbers generated in equal groups of 7. See method two for more discussion on this.
Here are all the possible combinations:
table(apply(expand.grid(c(outer(1:5,0:6,"+")),(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
15625 15625 15625 15625 15625 15625 15625
I think it's straight forward to show that Adam's method will run much much faster. The probability that there are 42 or more calls to rand5
in Adam's solution is very small ((4/25)^21 ~ 10^(-17)
).
Now the second method, which is almost uniform, but requires 6 calls to rand5
:
rand7 <- function() (sum(sapply(1:6,function(i) i*rand5())) %% 7) + 1
Here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:6){
r = r + i*rand5()
}
return r %% 7 + 1
}
This is essentially one iteration of method 1. If we generate all possible combinations, here is resulting counts:
table(apply(expand.grid(1:5,(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
2233 2232 2232 2232 2232 2232 2232
One number will appear once more in 5^6 = 15625
trials.
Now, in Method 1, by adding 1 to 6, we move the number 2233 to each of the successive point. Thus the total number of combinations will match up. This works because 5^6 %% 7 = 1, and then we do 7 appropriate variations, so (7 * 5^6 %% 7 = 0).
If the argument of method 1 and 2 is understood, method 3 follows, and requires only 7 calls to rand5
. At this point, I feel this is the minimum number of calls needed for an exact solution.
Here is an R implementation:
rand5 <- function() sample(1:5,1)
rand7 <- function() (sum(sapply(1:7, function(i) i * rand5())) %% 7) + 1
For people not familiar with R, here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:7){
r = r + i * rand5()
}
return r %% 7 + 1
}
The distribution of rand5
will be preserved. If we do the math, each of the 7 iterations of the loop has 5 possible outcomes, thus total number of possible combinations are (7 * 5) %% 7 = 0
. Thus we can divide the random numbers generated in equal groups of 7. See method one and two for more discussion on this.
Here are all the possible combinations:
table(apply(expand.grid(0:6,(1:5)),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
5 5 5 5 5 5 5
I think it's straight forward to show that Adam's method will still run faster. The probability that there are 7 or more calls to rand5
in Adam's solution is still small ((4/25)^3 ~ 0.004
).
This is a minor variation of the the second method. It is almost uniform, but requires 7 calls to rand5
, that is one additional to method 2:
rand7 <- function() (rand5() + sum(sapply(1:6,function(i) i*rand5())) %% 7) + 1
Here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:6){
r = r + i*rand5()
}
return (r+rand5()) %% 7 + 1
}
If we generate all possible combinations, here is resulting counts:
table(apply(expand.grid(1:5,(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6,1:5),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
11160 11161 11161 11161 11161 11161 11160
Two numbers will appear once less in 5^7 = 78125
trials. For most purposes, I can live with that.
7 * rand5() / 5
? – kiwixz