CUDA brings together several things:
- Massively parallel hardware designed to run generic (non-graphic) code, with appropriate drivers for doing so.
- A programming language based on C for programming said hardware, and an assembly language that other programming languages can use as a target.
- A software development kit that includes libraries, various debugging, profiling and compiling tools, and bindings that let CPU-side programming languages invoke GPU-side code.
The point of CUDA is to write code that can run on compatible massively parallel SIMD architectures: this includes several GPU types as well as non-GPU hardware such as nVidia Tesla. Massively parallel hardware can run a significantly larger number of operations per second than the CPU, at a fairly similar financial cost, yielding performance improvements of 50× or more in situations that allow it.
One of the benefits of CUDA over the earlier methods is that a general-purpose language is available, instead of having to use pixel and vertex shaders to emulate general-purpose computers. That language is based on C with a few additional keywords and concepts, which makes it fairly easy for non-GPU programmers to pick up.
It's also a sign that nVidia is willing to support general-purpose parallelization on their hardware: it now sounds less like "hacking around with the GPU" and more like "using a vendor-supported technology", and that makes its adoption easier in presence of non-technical stakeholders.
To start using CUDA, download the SDK, read the manual (seriously, it's not that complicated if you already know C) and buy CUDA-compatible hardware (you can use the emulator at first, but performance being the ultimate point of this, it's better if you can actually try your code out)