0
votes

In the "Maximum Expansion Factor" section of the zlib technical details it is stated "In the worst possible case, where the other block types would expand the data, deflation falls back to stored (uncompressed) blocks."

I am having a hard time figuring out where in the zlib compress/deflate code this decision actually happens. I can see that deflate_stored is called when the selected level is 0, which makes sense, but besides that I don't see it being used.

If someone can point me in the right direction that would be helpful.

Also, what is the block granularity (in terms of the uncompressed data) that these decisions are taken? I understand that in deflate the uncompressed block can be upto 64KB, but there is no defined block size for the compressed chunks. Clearly it has to do with how useful the huffman codes are for the blocks but it would be nice to know if there was a block size at which these decisions are taken.

2

2 Answers

1
votes

Here in trees.c:

/* ===========================================================================
 * Determine the best encoding for the current block: dynamic trees, static
 * trees or store, and write out the encoded block.
 */
void ZLIB_INTERNAL _tr_flush_block(s, buf, stored_len, last)

The size of a block is measured in symbols, i.e. the number of literal and length/distance pairs -- not the size of the uncompressed data. The size of an emitted block is determined by the memory setting, where by default it is 16383 symbols. At that time, or at the end of the input data if that comes first, it is determined whether a dynamic, static, or stored block for those symbols would be coded the smallest.

1
votes

The decision at the block level is taken in tree.c (https://github.com/madler/zlib/blob/master/trees.c)