3
votes

I'm trying to use zlib to compress an arbitrary data stream but I only have a fixed-size, small (only a few KB) output file in which to store the resulting compressed data. I would like to compress as much of the input data as possible, and then it's acceptable to pitch any remaining input that can't be handled due to the limited output size.

All the zlib examples seem to assume that all output produced by deflate() can be consumed and written to the output file. In my case, I may be forced to stop mid-stream when the output file runs out of space. However, I always know the space remaining in my output file as I loop calling deflate(), and I never set avail_out to a value exceeding the remaining space. I am always passing Z_SYNC_FLUSH to deflate. I am concerned since the docs say that when deflate() returns with avail_out == 0 (which pretty much always happens in my case), there is potentially still more output data to consume and I need to call deflate() again. I my case I cannot always call deflate() again because I have no more output file space left to store any additional compressed data it might return.

So, is it OK to bail out in the middle of processing input, at any arbitrary point after calling deflate() with Z_SYNC_FLUSH, and expect that inflate() will succeed when handed all the compressed data returned up to and including that point? And, is Z_SYNC_FLUSH the correct/best approach here? Again, this approach appears to work experimentally, but I am not confident that it will always work just based on reading the zlib docs.

1

1 Answers

4
votes

Technically, yes, you can cut off a deflate stream at any byte, and inflate will be able to process up to your last byte and decompress whatever it can get out of the provided partial deflate stream.

However that is not a good idea, since a) you lose the integrity check at the end of the stream, and b) you have the additional job to figure out where the uncompressed data ends so that you can send the rest of it in another packet, I presume. The only way to do b) would be to replicate the decompression process on the compression end.

Alternatively you could use fitblk.c, which does three deflate passes to construct a bona fide deflate stream that fits in a fixed size output block.