CUDA find max value in shared memory

Question

I have a kernel which produces an array of result values and I want to find the maximum of these values efficiently. The Array is initialized in the beginning of the kernel with some negative value (for example -1). The Kernel is executed using 5 blocks each with 256 threads, for example.

Here are the problems:

Because of my data, i must terminate threads, that are not valid, so I am working sometimes with 256 threads, sometimes 50, 20 and so on.
In shared memory are written results from block, but as I mentioned, some array has 50 results, some has 256 results...(so shared array looks like this) 8,6,4,9,1,-1,-1,-1...
In that case how to efficiently find the maximum in one block ?

Parallel reduction will be complicated on these types of array, isn't it ? How to do this ?

What do you mean with 'terminate' threads? Do a 'return' for that thread? Because that is not recommendable — pQB
Have you considered using one of the free libraries available for finding the max value (e.g. ArrayFire)? — arrayfire

djmj djmj · Accepted Answer · 2012-04-19T12:33:45

There is not enough information about your algorithm.

What do you mean by n results? Are the ignored values in the array set to -1 or do you use dynamic extern shared memory and threads only write up to index n (sounds hard to realize)?

Use a fixed size shared memory and set ignored values to -1 and still perform parallel reduction, if you search positive maximum and its filled with -1 it doesn't matter, or
Don't terminate threads, instead set a flag in another array if thread should not perform further calculations and still do parallel reduction

CUDA find max value in shared memory

2 Answers