The OpenCL benchmarking site http://www.clbenchmark.com/ has benchmarks for
Image Filter: Separable Gaussian Blur - Global Memory Usage and
Image Filter: Separable Gaussian Blur - Image Memory Usage
Nvidia complete dominates on the Global Memory Usage. For example the GTX 580 is nearlly twice as fast as the HD 7970. It's one of the few benchmarks where Nvidia still leads. Can someone explain why this is?
The reason I ask is that I have written a ray tracer on my GTX 590 which runs very fast. From most reviews I expected my ray tracer to run four times faster on a HD 7970. However, it actually runs four times slower! And I don't understand why. I don't use Image Buffers. I write out the pixels to global memory. When I profile the kernel time I see that the HD 7950 kernel time is four times slower so I know the problem is at the kernel side and not when moving data across the PCI bus.