2
votes

I've searched a lot over the internet to find a way to generate random numbers on my CUDA device, within a kernel. The numbers must come from a gaussian distribution.

The best thing I found was from NVIDIA itself. It is the Wallace algorithm, that uses a uniform distribution to build a gaussian one. But the code samples they give lack explanation and I really need to understand how the algorithm goes, especially on the device. For example, they give:

 __device__ void generateRandomNumbers_wallace(  
unsigned seed,  // Initialization seed  
 float *chi2Corrections,  // Set of correction values  
 float *globalPool,  // Input random number pool  
 float *output  // Output random numbers  


    unsigned tid=threadIdx.x;  
    // Load global pool into shared memory.  
     unsigned offset = __mul24(POOL_SIZE, blockIdx.x);  
    for( int i = 0; i < 4; i++ )  
      pool[tid+THREADS*i] = globalPool[offset+TOTAL_THREADS*i+tid];  
    __syncthreads();  
      const unsigned lcg_a=241;  
      const unsigned lcg_c=59;  
      const unsigned lcg_m=256;  
      const unsigned mod_mask = lcg_m-1;  
      seed=(seed+tid)&mod_mask ;  
      // Loop generating outputs repeatedly  
     for( int loop = 0; loop < OUTPUTS_PER_RUN; loop++ )  
      {  
        Transform();  
        unsigned intermediate_address;  
        i_a = __mul24(loop,8*TOTAL_THREADS)+8*THREADS *  
          blockIdx.x + threadIdx.x;  
        float chi2CorrAndScale=chi2Corrections[  
          blockIdx.x * OUTPUTS_PER_RUN + loop];  
        for( i = 0; i < 4; i++ )  
          output[i_a + i*THREADS]=chi2CorrAndScale*pool[tid+THREADS*i];  
    }  

First of all, many of the variables declared aren't even used in the function! And I really don't get what the "8" is for in the second loop. I understand the "4" in the other loops have something to do with the 4x4 orthogonal matrix block, am I right? Could anyone give me a better idea of what is going on here?

Anyway, does anyone have any good code samples I could use? Or does anyone have another way of generating random gaussian numbers in a CUDA kernel? Code samples will be much appreciated.

Thanks!

3

3 Answers

4
votes

You could use CURAND, which is included with the CUDA Toolkit (version 3.2 and later). It'd be far simpler!

A few notes on the code you posted:

  • The Wallace generator transforms Gaussian to Gaussian (i.e. not Uniform to Gaussian)
  • CUDA code has two implicit variables: blockIdx and threadIdx - these define the block index and thread index with a block, see the CUDA Programming Guide for more information
  • The code uses __mul24, on sm_20 and later this is actually slower than "ordinary" 32-bit multiplication so I would avoid it (even on older architectures for simplicity)
1
votes

The Fast Walsh Hadamard transform is done by patterns of addition and subtraction. Hence the central limit theorem applies. An array of uniform random numbers that undergoes a Walsh Hadamard transformation will have a Gaussian/Normal distribution. There are some slight technical details about that. The algorithm was not discovered by Wallace. It was first published in Servo Magazine around 1993/1994 by myself. I have code about the Walsh Hadamard transform at www.code.google.com/p/lemontree Regards, Sean O'Connor