Compress Random 32-bit Integers: How close can we get to Shannon Entropy?

Question

I've developed a lossless compression algorithm that compresses 32-bit integers (of unknown frequency/probability) to 31.95824 bits per integer (it works a lot better for smaller values, just as most compression algorithms do). Obviously it isn't possible to compress uniformly-distributed random data to become smaller than its uncompressed size.

Therefore my question is, which lossless compression algorithms get closest to the Shannon Entropy of 32 bits per integer for pseudorandom data, assuming 32-bit integers?

Essentially, I'm looking for a table which includes compression algorithms and their respective bits-per-integer value for positive, compressed, 32-bit integers.

I guess those that monitor input-to-output-size ratio and emit a itty-bitty "literal block" tag followed by exactly that. — greybeard
Have you tried any standard algorithms yourself? Are you working in a language where implementations are readily available? — Mark Ransom
P.S. There's really no difference between random 32-bit integers and random 8-bit ones. — Mark Ransom
@MarkRansom I'm just curious to see how my algorithm performs against others'. I've been testing other algorithms myself, but I imagined someone had asked a similar question before and possibly had some information that would be useful. Also, 8-bit integers don't work well with my algorithm unfortunately due to there being such a small set of values. — Jacob G.
My point is that you can create a random 32-bit value by concatenating 4 8-bit values. Standard algorithms will operate on 8-bit values. — Mark Ransom

Mark Adler Mark Adler · Accepted Answer · 2018-02-09T22:13:54

When you say "it works a lot better for smaller values", I presume that you have a transformation from the 32-bit integer to a variable-bit-length representation that is optimized for some non-uniform expected distribution of values. Then that same transformation applied to a uniform distribution of 32-bit values will necessarily take more than 32 bits on average. How much more depends on how non-uniform a distribution you started with.

So the answer is, of course you can get to 32 bits exactly by doing nothing at all to the number. But then you are not optimized for the application implied by the non-uniform distribution you designed to.

Compress Random 32-bit Integers: How close can we get to Shannon Entropy?

2 Answers