I'm facing a problem with the reduce_by_key function of the thrust library. It looks like a bug to me but I'd like to be sure before reporting.
first, my setup: CUDA 7.0, Windows 8, NIVIDA GeForce 820m. The whole thing is compiled using visual studio 2010 and nvcc in release mode, 64bits.
Now, the exercise that illustrates the problem.
I have a vector of random numbers called devData generated on my device. I tabulate a vector of indices called devIndices of the same size defined as follow :
- devIndices = [0, 0, 0, 0, 1, 1, 1, 1, ... K-1, K-1, K-1, K-1]
- devData = [ 1, 4, 5, 7, 5, 8, 9, 6, ... 7, 8, 9, 6]
So that each value in devIndices is repeated mod = 4 time in this example.
Then, I just want to reduce_by_key devData using devIndices to obtain the reduced vectors that follow:
- devIndices = [0, 1, ..., K-1]
- devData = [17, 28,..., 30]
(if I'm right with the arithmetic :) )
Now, I know for sure that the elements of devIndices should sum up to a value T given by the following relation :
- T = [(K-1) * K /2] (ex: [0 1 2 3] -> 6 = (K-1)*K/2 = 3 * 4 /2)
I tried to do this on my machine and it works fine for small numbers of elements but it fails for large ones. (100,000 fails...)
Below is the code I use to manipulate my two vectors as described above and output the sum of devIndices at the end. You can play around with the parameter k that basically sets the number of elements.
#include <cuda.h>
#include <thrust/random.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/iterator/counting_iterator.h>
#include <fstream>
typedef typename thrust::device_vector<int> tDevVecInt;
typedef typename thrust::device_vector<float> tDevVecFlt;
struct rando : public thrust::unary_function<unsigned int, float>
{
unsigned int mainSeed;
rando(unsigned int _mainSeed):mainSeed(_mainSeed) {}
__host__ __device__ float operator()(unsigned int x)
{
unsigned int seed = x * mainSeed;
thrust::random::taus88 mac(seed);
thrust::uniform_real_distribution<float> dist(0,1);
return dist(mac);
}
};
struct modSim : public thrust::unary_function<int, int>
{
int sz;
modSim(int in)
{
this->sz = in;
}
__host__ __device__ int operator()(const int &x)
{
return x/sz;
}
};
int main()
{
int mod = 10;
int k = 10000;
int szData = k*mod;
tDevVecFlt devData(szData, 0.);
tDevVecInt devIndices(szData, 0.);
thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(0) + szData, devData.begin(), rando(123456789));
thrust::tabulate(devIndices.begin(), devIndices.end(), modSim(mod));
thrust::reduce_by_key(devIndices.begin(), devIndices.end(), devData.begin(), devIndices.begin(), devData.begin());
std::cout << thrust::reduce(devIndices.begin(), devIndices.begin()+ k, 0) << std::endl;
return 0;
}
Worst of all: when I run several times the same piece of code, I get different results! The random vector has nothing to do with this (it is seeded... and I checked it by the way).
So the question part now:
- Am I wrong somewhere? Reduce_by_key seems the right tool to me
- Does anyone reproduce this irreproducibility?
- If this is indeed a bug, what is the usual way to report?