38
votes

Duplicate:

Unique random numbers in O(1)?

I want an pseudo random number generator that can generate numbers with no repeats in a random order.

For example:

random(10)

might return 5, 9, 1, 4, 2, 8, 3, 7, 6, 10

Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?


Edit:

Also I want it to be efficient in generating big numbers without the entire range.


Edit:

I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.

28
There are PRNGs that do not repeat until the entire cycle is over — any of them that use the last generated number as seed for the next have that property.derobert
Regarding your edit: if you used a regular RNG and added numbers into a set, how much memory do you think that would use? Same amount as if you generated the list of numbers ahead of time...David Z
Derobert, what are the names of some of them? That is exactly the type of solution I am looking for.Unknown
This really isn't a duplicate, since the OP has indicated that he wants to generate a small set of uniques from a very large range. Shuffling isn't appropriate in that case.Bill the Lizard
@Bill the correct (as opposed to the accepted) answer in that question doesn't require shuffling.Pete Kirkham

28 Answers

29
votes

You may be interested in a linear feedback shift register. We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").

18
votes

A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.

16
votes

In order to ensure that the list doesn't repeat, it would have to keep a list of numbers previously returned. As it has to therefore generate the entire list by the end of the algorithm, this is equivalent in storage requirement to generating the ordered list and then shuffling.

More about shuffling here: Creating a random ordered list from an ordered list

However, if the range of the random numbers is very large but the quantity of numbers required is small (you've hinted that this is the actual requirement in a comment), then generate a complete list and shuffling it is wasteful. A shuffle on a huge array involves accessing pages of virtual memory in a way that (by definition) will defeat the OS's paging system (on a smaller scale the same problem would occur with the CPU's memory cache).

In this case, searching the list-so-far will be much more efficient. So the ideal would be to use heuristics (determined by experiment) to pick the right implementation for the given arguments. (Apologies for giving examples in C# rather than C++ but ASFAC++B I'm training myself to think in C#).

IEnumerable<int> GenerateRandomNumbers(int range, int quantity)
{
    int[] a = new int[quantity];

    if (range < Threshold)
    {
        for (int n = 0; n < range; n++)
            a[n] = n;

        Shuffle(a);
    }
    else
    {
        HashSet<int> used = new HashSet<int>();

        for (int n = 0; n < quantity; n++)
        {
            int r = Random(range);

             while (!used.Add(r))
                 r = Random(range);

             a[n] = r;
        }
    }

    return a;
}

The cost of doing the checking for repeated numbers, the looping while there are collisions, etc. will be expensive, but there will likely be some Threshold value where it becomes faster than allocating for the entire range.

For sufficiently small quantity requirements, it may be faster to use an array for used and do linear searches in it, due to the greater locality, lower overhead, the cheapness of the comparison...

Also for large quantities AND large ranges, it might be preferable to return an object that produces the numbers in the sequence on request, instead of allocating the array for the results upfront. This is very easy to implement in C# thanks to the yield return keyword:

IEnumerable<int> ForLargeQuantityAndRange(int quantity, int range)
{
    for (int n = 0; n < quantity; n++)
    {
        int r = Random(range);

        while (!used.Add(r))
            r = Random(range);

        yield return r;
    }
}
7
votes

If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).

5
votes

I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.

Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.

There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.

Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.

unsigned int reversableHash(unsigned int x)
{
   x*=0xDEADBEEF;
   x=x^(x>>17);
   x*=0x01234567;
   x+=0x88776655;
   x=x^(x>>4);
   x=x^(x>>9);
   x*=0x91827363;
   x=x^(x>>7);
   x=x^(x>>11);
   x=x^(x>>20);
   x*=0x77773333;
   return x;
}
3
votes

If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.

3
votes

A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.


If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.

2
votes

What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.

2
votes

This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.

1
votes

If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.

For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?

Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.

1
votes

As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.

The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.

(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)

1
votes

I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.

There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).

The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case. The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:

Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:

x = LFSR.getRandomNumber();
while (x > 10) {
   x = LFSR.getRandomNumber();
}

You should embed the previous code in your RNG; so that the caller wouldn't care about implementation. Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.

1
votes

This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.

There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.

One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.

I will give both algorithms further below.

If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.

If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.

This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.

Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.

The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective

int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
      // you can instance or acquire any object at this place in the algorithm to load
      // by reference, into the array and use a pointer to it in place of j
      int j = i; // in this example, j is trivially i
    if (i == 0) { // first integer goes into first location
        array[i] = j; // this may get swapped from here later
    } else { // subsequent integers go into random locations
            // the next random location will be somewhere in the locations
            // already used or a new one at the end
            // here we get the next random location
            // to preserve true randomness without a significant bias
            // it is REALLY IMPORTANT that the newest value could be
            // stored in the newest location, that is, 
            // location has to be able to randomly have the value i
            int location = nextInt(i + 1); // a random value between 0 and i
            // move the random location's value to the new location
            array[i] = array[location];
            array[location] = j; // put the new value into the random location
    } // end if...else
} // end for

Voila, you now have an already randomized array.

If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.

type[] array = new type[size];

// some code that loads array...

// randomly pick an item anywhere in the current array segment, 
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
    int location = nextInt(i + 1);
    temp = array[i];
    array[i] = array[location];
    array[location] = temp;
} // end for

For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.

0
votes

Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.

0
votes

Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).

The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.

/* the function returns a random number between 0 and max -1
 * not necessarily unique
 * I assume it's written
 */
int random(int max);

/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{

    static int *list = NULL;    /* contains a list of numbers we haven't returned */
    static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
                                 * 1 --> we have started randomizing numbers
                                 */

    static int count;
    static prev_max = 0;

    // initialize the list
    if (!in_progress || (prev_max != max)) {
        if (list != NULL) {
            free(list);
        }
        list = malloc(sizeof(int) * max);
        prev_max = max;
        in_progress = 1;
        count = max - 1;

        int i;
        for (i = max - 1; i >= 0; --i) {
            list[i] = i;
        }
    }

    /* now choose one from the list */
    int index = random(count);
    int retval = list[index];

    /* now we throw away the returned value.
     * we do this by shortening the list by 1
     * and replacing the element we returned with
     * the highest remaining number
     */
    swap(&list[index], &list[count]);

    /* when the count reaches 0 we start over */
    if (count == 0) {
        in_progress = 0;
        free(list);
        list = 0;
    } else { /* reduce the counter by 1 */
        count--;
    }
}

/* swap two numbers */
void swap(int *x, int *y)
{
    int temp = *x;
    *x = *y;
    *y = temp;
}
0
votes

Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.

0
votes

Suppose you wanted to generate a series of 256 random numbers without repeats.

  1. Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
  2. Your looping variable will be n, the number of numbers yet to be generated
  3. Loop from n = 256 to n = 1
  4. Generate a random number r in the range [0, n)
  5. Find the r-th zero bit in your memory block b, let's call it p
  6. Put p in your list of results, an array called q
  7. Flip the p-th bit in memory block b to 1
  8. After the n = 1 pass, you are done generating your list of numbers

Here's a short example of what I am talking about, using n = 4 initially:

**Setup**
b = 0000
q = []

**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]

**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]

**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]

** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
0
votes

Please check answers at

Generate sequence of integers in random order without constructing the whole list upfront

and also my answer lies there as

 very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
0
votes

I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/

0
votes
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);

generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.

I did it with long for the example, but you should make that a template if your PRNG is templatized.

Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.

0
votes

If you don't mean poor statisticall properties of generated sequence, there is one method:

Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".

So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).

You can split that number into single bits and put it in some of less significant bits of generated number.

That way you are sure you get unique number each time.

I mean for example each generated number looks like that: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx where x is take directly from generator, and ys are taken from counter variable.

0
votes

Mersenne twister

Description of which can be found here on Wikipedia: Mersenne twister

Look at the bottom of the page for implementations in various languages.

0
votes

The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).

Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).

You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf

Here's some java code:

/** * Generate a sequence of unique "random" numbers in [0,M) * @author dkoes * */

public class UniqueRandom { long lfsr; long mask; long max;

private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
        null, // 0
        null, // 1
        null, // 2
        {3,2}, //3
        {4,3},
        {5,3},
        {6,5},
        {7,6},
        {8,6,5,4},
        {9,5},
        {10,7},
        {11,9},
        {12,6,4,1},
        {13,4,3,1},
        {14,5,3,1},
        {15,14},
        {16,15,13,4},
        {17,14},
        {18,11},
        {19,6,2,1},
        {20,17},
        {21,19},
        {22,21},
        {23,18},
        {24,23,22,17},
        {25,22},
        {26,6,2,1},
        {27,5,2,1},
        {28,25},
        {29,27},
        {30,6,4,1},
        {31,28},
        {32,22,2,1},
        {33,20},
        {34,27,2,1},
        {35,33},
        {36,25},
        {37,5,4,3,2,1},
        {38,6,5,1},
        {39,35},
        {40,38,21,19},
        {41,38},
        {42,41,20,19},
        {43,42,38,37},
        {44,43,18,17},
        {45,44,42,41},
        {46,45,26,25},
        {47,42},
        {48,47,21,20},
        {49,40},
        {50,49,24,23},
        {51,50,36,35},
        {52,49},
        {53,52,38,37},
        {54,53,18,17},
        {55,31},
        {56,55,35,34},
        {57,50},
        {58,39},
        {59,58,38,37},
        {60,59},
        {61,60,46,45},
        {62,61,6,5},
        {63,62},
};

//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
    max = m;
    lfsr = seed; //could easily pass a starting point instead
    //figure out number of bits
    int bits = 0;
    long b = m;
    while((b >>>= 1) != 0)
    {
        bits++;
    }
    bits++;

    if(bits < 3)
        bits = 3; 

    mask = 0;
    for(int i = 0; i < taps[bits].length; i++)
    {
        mask |= (1L << (taps[bits][i]-1));
    }

}

//return -1 if we've cycled
long next()
{
    long ret = -1;
    if(lfsr == 0)
        return -1;
    do {
        ret = lfsr;
        //update lfsr - from wikipedia
        long lsb = lfsr & 1;
        lfsr >>>= 1;
        if(lsb == 1)
            lfsr ^= mask;

        if(lfsr == seed)            
            lfsr = 0; //cycled, stick

        ret--; //zero is stuck state, never generated so sub 1 to get it
    } while(ret >= max);

    return ret;
}

}

0
votes

Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.

List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
    label1.Text = "";
    listBox1.Items.Add("a");
    listBox1.Items.Add("b");
    listBox1.Items.Add("c");
    listBox1.Items.Add("d");
    listBox1.Items.Add("e");
}

private void button1_Click(object sender, EventArgs e)
{
    Random rand = new Random();
    int index=rand.Next(0, listBox1.Items.Count);
    string rndString = listBox1.Items[index].ToString();

    if (listBox1.Items.Count <= Erledigte.Count)
    {
        return;
    }
    else
    {
        if (Erledigte.Contains(rndString))
        {
            //MessageBox.Show("vorhanden");
            while (Erledigte.Contains(rndString))
            {
                index = rand.Next(0, listBox1.Items.Count);
                rndString = listBox1.Items[index].ToString();
            }
        }

        Erledigte.Add(rndString);
        label1.Text += rndString;
    }
}
0
votes

For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....

-1
votes

If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.

const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers; 
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
    const short int increment = myrandom( spread );
    numbers.push_back( numbers.back() + increment );
}

myshuffle( numbers );

The myrandom and myshuffle functions I hereby generously delegate to others :)

-1
votes

to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand: for example if you want to get 100 non repeated random number: 1. fill an array with numbers from 1 to 100 2. get a random number using Rand function in the range of (1-100) 3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction] 4. shift the number in the array after that Index to the left 5. repeat from step 2 but now the the rang should be (1-99) and go on

-1
votes

now we have a array with different numbers!

int main() {
    int b[(the number
    if them)];
    for (int i = 0; i < (the number of them); i++) {
    int a = rand() % (the number of them + 1) + 1;
    int j = 0;
    while (j < i) {
        if (a == b[j]) {
        a = rand() % (the number of them + 1) + 1;
        j = -1;
        }
        j++;
    }
    b[i] = a;
    }
}