I got this really weird error. I ran a sum over all elements in a matrix using thrust reduce. It ran well for most data, but it went wrong on one set.
code:
lbfgsfloatval_t sum(const DeviceVector& A){
thrust::device_ptr<lbfgsfloatval_t> ptr(A.getPtr());
thrust::device_vector<double> A_p(ptr, ptr + A.rows()*A.cols());
lbfgsfloatval_t sums = 0.0;
// reduce on host
for(int i = 0; i < A.rows()*A.cols();i++)
sums += A_p[i];
// reduce on device
lbfgsfloatval_t res = thrust::reduce(A_p.begin(), A_p.end());
cout << "cpu: " << sums << endl;
cout << "gpu: " << res << endl;
return res;
}
Notice the second group went wrong.
output:
cpu: -568.691
gpu: -568.691
cpu: 3.4972e-14
gpu: 1.40998e-14
cpu: 0.234375
gpu: 0.234375
I also tried not building thrust::device_vector, but use a raw pointer instead. Same output. I also tried cublas dot product. Same output.
I used matlab to confirm the cpu result above is correct.
What happened? Was it an underflow on GPU? Thanks!