How to return how many bytes were actually consumed to return the decompressed data from inflate()?

Question

I am using zlib to inflate some deflate compressed data. The caller specifies how many uncompressed bytes it wants, through zstream avail_out. What I need to get is how many actual bytes were consumed from the next_in buffer to inflate the requested avail_out number of bytes. The total_in amount is how much was inflated but it is larger than what was actually "needed" to populate the next_out buffer.

Example:

I have 126 bytes of compressed data and want to get the first 4 bytes of the uncompressed data. Now I want to pick up and get the next 4 bytes of uncompressed data, so

Where do I set the next_in pointer after the first inflate to start in the compressed data, so that after inflation the fext 4 bytes are what I want, as if I originally set avail_out as 8 and not 4?

I've tried Z_BLOCK to get the unused bits from the last byte but they don't align where I'd expect the next_in read to start.

Any ideas how to set next_in to inflate where I'd expect the next 4 uncompressed bytes to be? The stream is torn down between these calls.

Update:

I'm trying to do this something like this statement from the zlib how to site:

For applications where zlib streams are embedded in other data, this routine would need to be modified to return the unused data, or at least indicate how much of the input data was not used, so the application would know where to pick up after the zlib stream.

Clarification:

A better question might be:

Is it possible to know where in the compressed data equates to the 4 uncompressed bytes I wanted read? Is there something within zstream/zlib that stores/reports this?

This has all the tell-tale signs of the meta.stackexchange.com/questions/66377/what-is-the-xy-problem -- you need to explain what real problem you're trying to solve, that is, the real problem that you think can be solved knowing how many input bytes were consumed. This is meaningless metric. The input/output byte count is not comparable one for one, due to compression's inherent nature. The bytes you got on output might've corresponded to just some bits in the last input byte consumed, with the remaining bits corresponding to the following, not yet read, uncompressed data. — Sam Varshavchik
@SamVarshavchik That seems to be the crux of the issue. Is there something in zstream that allows me to correlate the location in the compressed data to the 4 bytes of uncompressed data I wanted? — TWhite
@TWhite There may not even exist such a correlation. The nature of compression is that in general there can't be such a correlation. — David Schwartz
@DavidSchwartz and others. If that is the consensus could someone put that as an answer so I may mark it? — TWhite
@TWhite Well, we're not entirely sure what you're asking. I'd prefer to see a clarification of your outer problem before answering. — David Schwartz

Mark Adler Mark Adler · Accepted Answer · 2015-12-17T02:37:57

Where do I set the next_in pointer after the first inflate to start in the compressed data, so that after inflation the fext 4 bytes are what I want, as if I originally set avail_out as 8 and not 4?

You don't set next_in. You just leave next_in alone. It already points to where it needs to point to continue decompression where you left off. You only need to potentially change next_in when you run out of input data, i.e. avail_in is zero.

The stream is torn down between these calls.

You cannot start decompression at some arbitrary point in the stream without having built up the context first that would get you to that point. That context is built up by decompressing everything that precedes it. You need the current sliding window of uncompressed data as well as the current dynamic code state.

If you are in control of the construction of the compressed data, you can insert points in the stream at which you can start decompression with no history, using Z_FULL_FLUSH. This ends the last block and starts a new block, and throws away the sliding window. Using this generally degrades compression.

For applications where zlib streams are embedded in other data, this routine would need to be modified to return the unused data, or at least indicate how much of the input data was not used, so the application would know where to pick up after the zlib stream.

That is not talking about what you think it's talking about. That is referring to the knowing where the end of the deflate stream is. It has nothing to do with trying to decompress from somewhere in the middle.

Is it possible to know where in the compressed data equates to the 4 uncompressed bytes I wanted read? Is there something within zstream/zlib that stores/reports this?

As you are decompressing, you can determine after any inflate() call returns how many bytes and bits have been consumed to generate the data returned so far. You just need to look at avail_in and data_type. (Read the inflate() documentation in zlib.h.) This will not allow you to decompress from that bit location at some later time, without having decompressed everything that preceded it.

What are you trying to do?

How to return how many bytes were actually consumed to return the decompressed data from inflate()?

1 Answers