I'm looking for a good lossless compression algorithm that can very quickly compress/decompress small amounts of data such as 256 floats that are between 0 and 1. I know RLE but maybe there's something better.
The background is that I'm working on volumetric data (e.g. 384³ floats) with CUDA and instead of storing the volume explicitly I want to divide it up into 8x8x4 sized blocks and store the compressed blocks. The CUDA kernels (each block consisting of 8x8x4 threads) the decompress the corresponding block, work on it and compress it again.
I'm grateful for any suggestions!