What can be the least possible value of data-compression-ratio for any real dataset

Question

I am writing ZLIB like API for an embedded hardware compressor which uses deflate algorithm for compression of given input stream.

Before going further i would like to explain data compression ratio. Data compression ratio is defined as the ratio between the uncompressed size and compressed size.

enter image description here

Compression ratio is usually greater than one. which mean compressed data is usually smaller than uncompressed data, which is whole point to do compression. but this is not always the case. for example using ZLIB library and pseudo-random data generated on some Linux machine give compression ratio of 0.996 roughly. which mean 9960 bytes compressed into 10000 bytes.

I know ZLIB handle this situation by using type 0 block where it simply return original uncompressed data with roughly 5 byte header so it give only 5 byte overhead up to 64KB data-block. This is intelligent solution of this problem but for some reason i can not use this in my API. I must have to provide extra safe space in advance to handle this situation.

Now if i know the least possible known data compression ratio it would be easy for me to calculate the extra space i have to provide. Otherwise to be safe, i have to provide more than needed extra space which can be crucial in embedded system.

While calculating data compression ratio, i am not concerned with header,footer,extremely small dataset and system specific details as i am separately handling that. What i am particularly interested in, is there exist any real dataset with minimum size of 1K and which can provide compression ratio less than 0.99 using deflate algorithm. In that case calculation would be:
Compression ratio = uncompressed size/(compressed size using deflate excluding header,footer and system specific overhead)

Please provide feedback. Any help would be appreciated. It would be great if reference to such dataset could be provided.

EDIT:
@MSalters comment indicate that hardware compressor is not following deflate specification properly and this can be a bug in microcode.

Are you aware that the compression ratio heavily depends on the data at hand? For example a 1 byte text file will be at least 45 bytes (if I'm correct) when zipped... Also, you'd have to keep in mind the file system specifics, specifically the size of a block to be able to even estimate this. The least compression ratio is the original data/(static overhead+the original data) in case of uncompressible data... (Just a story: A friend of mine once thought about a compression with a brilliant compression ratio: it would compress everything into 1 byte, and then always extract Microsoft Excel...) — ppeterka
@ppeterka66 :thanks,I am aware of what you said. if we use deflate, an zero byte empty file compressed into 2 bytes and 6 byte "header+footer" makes it total 8 bytes which give zero data compression ratio :) I am not concern with header,footer, extremely small dataset and system specific details as i am separately handling that. what i am particularly concern is, real data with minimum size of 1K and which can provide compression ratio less than "0.99" — raj_gt1
@raj_gt1: Please update your question to discuss your actual situation. You have tried to cast your real problem into an abstract one dealing with the deflate format. That question does not have a useful answer, since the lower limit on the compression ratio as you have defined it is zero, assuming that the only constraint is the deflate format. — Mark Adler
Compressing a random 1024-byte chunk of data, for example from /dev/urandom or other good PRNG like Xorshift, typically produces a deflated block of about 1072 bytes, yielding a deflate compression ratio of 0.96 of less. This shows that ~ 0.96 is common. You can find worse compression ratios in already compressed data. These are all common; the pathological cases that produce much worse compressions are rarer. — Nominal Animal
My survey on compression goo.gl/2okQWR (see Section 2.2) shows that most practical techniques on real datasets achieve compression ratio ~2X and some upto 4X, although potential for 16X exists. Techniques with higher ratio have high overhead too. — user984260

MSalters MSalters · Accepted Answer · 2013-09-24T11:48:44

The deflate algorithm has a similar approach as the ZLIB algorithm. It uses a 3 bit header, and the lower two bits are 00 when the following block is stored length-prefixed but otherwise uncompressed.

This means the worst case is an one byte input that blows up to 6 bytes (3 bits header, 32 bits length, 8 bits data, 5 bits padding), so the worst ratio is 1/6 = 0.16.

This is of course assuming an optimal encoder. A suboptimal encoder would transmit an Huffman table for that one byte.

What can be the least possible value of data-compression-ratio for any real dataset

3 Answers