2
votes

I'm trying to figure out how to prove that the Lempel ZIV 77 algorithm for Compression is really gives optimal compression.

I found the following info:

So how well does the Lempel-Ziv algorithm work? In these notes, we’ll
calculate two quantities. First, how well it works in the worst case, and
second, how well it works in the random case where each letter of the message
is chosen uniformly and independently from a probability distribution, where
the ith letter appears with probability pi
. In both cases, the compression
is asymptotically optimal. That is, in the worst case, the length of the
encoded string of bits is n + o(n). Since there is no way to compress all
length-n strings to fewer than n bits, this can be counted as asymptotically
optimal. In the second case, the source is compressed to length
                                 α
H(p1, p2, . . . , pα)n + o(n) = n∑(-pi log2 pi) + O(n)
                                     i=1
which is to first order the Shannon bound.

What is meant here? And why there is no way to compress alllength-n strings to fewer than n bits?

Thanks all.

1

1 Answers

1
votes

There are 2^n different random strings of length n. In order to decompress them, the compression algorithm must compress them all to different compressed versions: if two different n-long strings compressed to the same sequence, there would be no way to tell which of them to decompress to. If all were compressed to strings of length k < n there would be only 2^k<2^n different compressed strings, and so there would have to be some cases where two different strings compressed to the same value.

Note that there is in no practical guaranteed optimal scheme for all circumstances. If I know that a long apparently random sequence is the output of a stream cipher with a key I also know I can describe it by giving only the design of the cipher and the key, but it might take a great deal of time for a compression algorithm to work out that the long apparently random sequence can be compressed hugely in this way.