Normal Cuda Vs CuBLAS?

Question

Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own Cuda program for the matrix computations?

Is it similar to the relationship between normal C code and the BLAS library on CPU, which does the compiler level optimization? But GPU is intrinsically multi-threaded, so the situation may not quite like those on CPU. Say a matrix addition. — Fontaine007

Jonathan Cohen Jonathan Cohen · Accepted Answer · 2014-09-21T01:53:15

We highly recommend developers use cuBLAS (or cuFFT, cuRAND, cuSPARSE, thrust, NPP) when suitable for many reasons:

We validate correctness across every supported hardware platform, including those which we know are coming up but which maybe haven't been released yet. For complex routines, it is entirely possible to have bugs which show up on one architecture (or even one chip) but not on others. This can even happen with changes to the compiler, the runtime, etc.
We test our libraries for performance regressions across the same wide range of platforms.
We can fix bugs in our code if you find them. Hard for us to do this with your code :)
We are always looking for which reusable and useful bits of functionality can be pulled into a library - this saves you a ton of development time, and makes your code easier to read by coding to a higher level API.

Honestly, at this point, I can probably count on one hand the number of developers out there who actually implement their own dense linear algebra routines rather than calling cuBLAS. It's a good exercise when you're learning CUDA, but for production code it's usually best to use a library.

(Disclosure: I run the CUDA Library team)

Normal Cuda Vs CuBLAS?

2 Answers