In a CUDA C project, I would like to try and use the Thrust library in order to find the maximum element inside an array of floats. It seems like the Thrust function thrust::max_element() is what I need. The array on which I want to use this function is the result of a cuda kernel (which seems to work fine) and so it is already present in device memory when calling thrust::max_element(). I am not very familiar with the Thrust library but after looking at the documentation for thrust::max_element() and reading the answers to similar questions on this site, I thought I had grasped the working principles of this process. Unfortunately I get wrong results and it seems that I am not using the library functions correctly. Can somebody please tell me what is wrong in my code?
float* deviceArray;
float* max;
int length = 1025;
*max = 0.0f;
size = (int) length*sizeof(float);
cudaMalloc(&deviceArray, size);
cudaMemset(deviceArray, 0.0f, size);
// here I launch a cuda kernel which modifies deviceArray
thrust::device_ptr<float> d_ptr = thrust::device_pointer_cast(deviceArray);
*max = *(thrust::max_element(d_ptr, d_ptr + length));
I use the following headers:
#include <thrust/extrema.h>
#include <thrust/device_ptr.h>
I keep getting zero values for *max even though I am sure that deviceArray contains non-zero values after running the kernel. I am using nvcc as a compiler (CUDA 7.0) and I am running the code on a device with compute capability 3.5.
Any help would be much appreciated. Thanks.