2
votes

In a CUDA C project, I would like to try and use the Thrust library in order to find the maximum element inside an array of floats. It seems like the Thrust function thrust::max_element() is what I need. The array on which I want to use this function is the result of a cuda kernel (which seems to work fine) and so it is already present in device memory when calling thrust::max_element(). I am not very familiar with the Thrust library but after looking at the documentation for thrust::max_element() and reading the answers to similar questions on this site, I thought I had grasped the working principles of this process. Unfortunately I get wrong results and it seems that I am not using the library functions correctly. Can somebody please tell me what is wrong in my code?

float* deviceArray;
float* max;
int length = 1025;

*max = 0.0f;
size = (int) length*sizeof(float);     

cudaMalloc(&deviceArray, size);
cudaMemset(deviceArray, 0.0f, size);

// here I launch a cuda kernel which modifies deviceArray

thrust::device_ptr<float> d_ptr = thrust::device_pointer_cast(deviceArray);
*max = *(thrust::max_element(d_ptr, d_ptr + length));

I use the following headers:

#include <thrust/extrema.h>
#include <thrust/device_ptr.h>

I keep getting zero values for *max even though I am sure that deviceArray contains non-zero values after running the kernel. I am using nvcc as a compiler (CUDA 7.0) and I am running the code on a device with compute capability 3.5.

Any help would be much appreciated. Thanks.

1

1 Answers

3
votes

This is not proper C code:

float* max;
int length = 1025;

*max = 0.0f;

You're not allowed to store data using a pointer (max) until you properly provide an allocation for that pointer (and set the pointer equal to the address of that allocation).

Apart from that, the rest of your code seems to work for me:

$ cat t990.cu
#include <thrust/extrema.h>
#include <thrust/device_ptr.h>
#include <iostream>


int main(){

  float* deviceArray;
  float max, test;
  int length = 1025;

  max = 0.0f;
  test = 2.5f;
  int size = (int) length*sizeof(float);

  cudaMalloc(&deviceArray, size);
  cudaMemset(deviceArray, 0.0f, size);
  cudaMemcpy(deviceArray, &test, sizeof(float),cudaMemcpyHostToDevice);

  thrust::device_ptr<float> d_ptr = thrust::device_pointer_cast(deviceArray);
  max = *(thrust::max_element(d_ptr, d_ptr + length));
  std::cout << max << std::endl;
}
$ nvcc -o t990 t990.cu
$ ./t990
2.5
$