I am writing ZLIB
like API for an embedded hardware compressor which uses deflate algorithm for compression of given input stream.
Before going further i would like to explain data compression ratio. Data compression ratio is defined as the ratio between the uncompressed size and compressed size.
Compression ratio is usually greater than one. which mean compressed data is usually smaller than uncompressed data, which is whole point to do compression. but this is not always the case. for example using ZLIB
library and pseudo-random data generated on some Linux machine give compression ratio of 0.996 roughly. which mean 9960 bytes compressed into 10000 bytes.
I know ZLIB
handle this situation by using type 0 block where it simply return original uncompressed data with roughly 5 byte header so it give only 5 byte overhead up to 64KB data-block. This is intelligent solution of this problem but for some reason i can not use this in my API. I must have to provide extra safe space in advance to handle this situation.
Now if i know the least possible known data compression ratio it would be easy for me to calculate the extra space i have to provide. Otherwise to be safe, i have to provide more than needed extra space which can be crucial in embedded system.
While calculating data compression ratio, i am not concerned with header,footer,extremely small dataset and system specific details as i am separately handling that. What i am particularly interested in, is there exist any real dataset with minimum size of 1K and which can provide compression ratio less than 0.99
using deflate algorithm. In that case calculation would be:
Compression ratio = uncompressed size/(compressed size using deflate excluding header,footer and system specific overhead)
Please provide feedback. Any help would be appreciated. It would be great if reference to such dataset could be provided.
EDIT:
@MSalters comment indicate that hardware compressor is not following deflate specification properly and this can be a bug in microcode.
/dev/urandom
or other good PRNG like Xorshift, typically produces a deflated block of about 1072 bytes, yielding a deflate compression ratio of 0.96 of less. This shows that ~ 0.96 is common. You can find worse compression ratios in already compressed data. These are all common; the pathological cases that produce much worse compressions are rarer. – Nominal Animal