More efficent way of computing multiple fft with CuFFT than batching

Question

According to NVIDIA documentation, a batched CuFFT will execute the batches in parallel:

batch denotes the number of transforms that will be executed in parallel (https://docs.nvidia.com/cuda/cufft/index.html#function-cufftplan2d)

I want to perform a 2D FFt with 500 batches and I noticed that the computing time of those FFTs depends almost linearly on the number of batches. Therefore I wondered if the batches were really computed in parallel. One FFT of 1500 by 1500 pixels and 500 batches runs in approximately 200ms.

In the case with a big number of FFT to be run concurrently, is using batches the best approach to reduce the computing time or shall I maybe consider streaming or whatever other method?

I could not find more detailed information about the internal execution of the batches on NVIDIA documentation yet.

Batching is best, not streaming. Your GPU is being saturated with work which is why you see linear increase as you increase the batch size. — Robert Crovella

Unknown Unknown · Accepted Answer · 2019-11-15T11:45:17

I want to perform a 2D FFt with 500 batches and I noticed that the computing time of those FFTs depends almost linearly on the number of batches.

That is to be expected once enough parallel work has been scheduled to saturate the concurrent processing capacity of your GPU. There may not be linear dependence with a very small number of batches, but you should find that there is a transition from something close to constant time with a very small batch size, to linear time at large batch sizes.

Therefore I wondered if the batches were really computed in parallel.

You can assume that they are.

In the case with a big number of FFT to be run concurrently, is using batches the best approach to reduce the computing time

Yes

...or shall I maybe consider streaming or whatever other method?

No.

More efficent way of computing multiple fft with CuFFT than batching

1 Answers