Or more accurately stated, when two identical strings are concatenated to each other, why can't zlib deflate the entire second string? It seems that when a matching string starts immediately after the previous instance of the same string, zlib emits the first character as a string literal and then emits a backwards reference to the previous string minus the first character.
For example, if I use zlib to deflate the string latelate, the output is 5 string literals followed by a back reference...
l a t e l <len=3, dist=4>
or huffman encoded...
0000000 cb 49 2c 49 cd 01 62 00
0000010
where I've simplified the output by using a "raw" deflate stream (i.e. windowBits = -15) and the fixed huffman encoding (i.e. the compression strategy is Z_FIXED).
Why must zlib emit the second literal character 'l' before using a back reference to "ate"?
In other words, why can't it output...?
l a t e <len=4, dist=4>
I tried forcing the second version with my own deflate implementation, but zlib won't inflate the output. I get the error "invalid or incomplete deflate data".