I am trying to debug a problem with some code that uses zlib
1.2.8. The problem is that this larger project can make archives, but runs into Z_DATA_ERROR
header problems when trying to extract that archive.
To do this, I wrote a small test program in C++ that compresses ("deflates") a specified regular file, writes the compressed data to a second regular file, and extracts ("inflates") to a third regular file, one line at a time. I then diff
the first and third files to make sure I get the same bytes.
For reference, this test project is located at: https://github.com/alexpreynolds/zlib-test and compiles under Clang (and should also compile under GNU GCC).
My larger question is how to deal with header data correctly in my larger project.
In my first test scenario, I can set up compression machinery with the following code:
z_error = deflateInit(this->z_stream_ptr, ZLIB_TEST_COMPRESSION_LEVEL);
Here, ZLIB_TEST_COMPRESSION_LEVEL
is 1
, to provide best speed. I then run deflate()
on the z_stream
pointer until there is nothing left that comes out of compression.
To extract these bytes, I can use inflateInit()
:
int ret = inflateInit(this->z_stream_ptr);
So what is the header format, in this case?
In my second test scenario, I set up the deflate machinery like so:
z_error = deflateInit2(this->z_stream_ptr,
ZLIB_TEST_COMPRESSION_LEVEL,
ZLIB_TEST_COMPRESSION_METHOD,
ZLIB_TEST_COMPRESSION_WINDOW_BITS,
ZLIB_TEST_COMPRESSION_MEM_LEVEL,
ZLIB_TEST_COMPRESSION_STRATEGY);
These deflate constants are, respectively, 1
for level, Z_DEFLATED
for method, 15+16
or 31
for window bits, 8
for memory level, and Z_DEFAULT_STRATEGY
for strategy.
The former inflateInit()
call does not work; instead, I must use inflateInit2()
and specify a modified window bits value:
int ret = inflateInit2(this->z_stream_ptr, ZLIB_TEST_COMPRESSION_WINDOW_BITS + 16);
In this case, the window bits value is not 31
as in the deflateInit2()
call, but 15+32
or 47
.
If I use 31
(or any other value than 47
), then I get a Z_DATA_ERROR
on subsequent inflate()
calls. That is, if I use the same window bits for the inflateInit2()
call:
int ret = inflateInit2(this->z_stream_ptr, ZLIB_TEST_COMPRESSION_WINDOW_BITS);
Then I get the following error on attempting to inflate()
:
Error: inflate to stream failed [-3]
Here, -3
is the same as Z_DATA_ERROR
.
According to the documentation, using 31
with deflateInit2()
should write a gzip
header and trailer. Thus, 31
on the following inflateInit2()
call should be expected to be able to extract the header information.
Why is the modified value 47
working, but not 31
?
My test project is mostly similar to the example code on the zlib
site, with the exception of the extraction/inflation code, which inflates one z_stream
chunk at a time and parses the output for newline characters.
Is there something special about running inflate()
only when a new buffer of extracted data is asked for — like header information going missing between inflate()
calls — as opposed to running the whole extraction in one pass, as in the zlib
example code?
My larger debugging problem is looking for a robust way to extract a chunk of zlib
-compressed data only on request, so that I can extract data one line at a time, as opposed to getting the whole extracted file. Something about the way I am handling the zlib
format parameter seems to be messing me up, but I can't figure out why or how to fix this.