how to zlib inflate a gzip/deflate archive

Question

I have an archive encoded with gzip 1.5. I'm unable to decode it using the C zlib library. zlib inflate() return EC -3 stream.msg = "unknown compression method".

$ gzip --list --verbose vmlinux.z
method  crc     date  time           compressed        uncompressed  ratio uncompressed_name
defla 12169518 Apr 29 13:00             4261643             9199404  53.7% vmlinux

The first 32 bytes of the file are:

00000000  1f 8b 08 08 29 f4 8a 60  00 03 76 6d 6c 69 6e 75  |....)..`..vmlinu|
00000010  78 00 ec 9a 7f 54 1c 55  96 c7 6f 75 37 d0 fc 70  |x....T.U..ou7..p|

I see the first 18 bytes are the RFC-1952 gzip header. After the NULL, I expect the next byte to be RFC-1951 deflate or RFC-1950 zlib (I'm not sure which)

So, I pass zlib inflate() a z_stream:next_in pointing to to the byte @0x12.

If this were deflate encoded, then I would expect the next byte @0x12 to be 0aabbbbb (BFINAL=0 and BTYPE=some compression)

If this were zlib encoded, I would expect the next byte @0x12 to take the form 0aaa1000 bbbccccc

Instead, I see @0x12 EC = 1110 1100 Which fits neither of those.

For my code, I took the uncompress() code and modified it slightly with allocators appropriate to my environment and several different experiments with the window bits (including 15+16, -MAX_WBITS, and MAX_WBITS).

int ZEXPORT unzip (dest, destLen, source, sourceLen)
    Bytef *dest;
    uLongf *destLen;
    const Bytef *source;
    uLong sourceLen;
{
    z_stream stream;
    int err;

    stream.next_in = (Bytef*)source;
    stream.avail_in = (uInt)sourceLen;
    /* Check for source > 64K on 16-bit machine: */
    if ((uLong)stream.avail_in != sourceLen) return Z_BUF_ERROR;

    stream.next_out = dest;
    stream.avail_out = (uInt)*destLen;
    if ((uLong)stream.avail_out != *destLen) return Z_BUF_ERROR;

    stream.zalloc = (alloc_func)my_alloc;
    stream.zfree = (free_func)my_free;

    /*err = inflateInit(&stream);*/
    err = inflateInit2(&stream, 15 + 16);
    if (err != Z_OK) return err;

    err = inflate(&stream, Z_FINISH);
    if (err != Z_STREAM_END) {
        inflateEnd(&stream);
        return err == Z_OK ? Z_BUF_ERROR : err;
    }
    *destLen = stream.total_out;

    err = inflateEnd(&stream);
    return err;
}

How can I correct my decoding of this file?

You have provided exactly none of your code in your question. Did you use inflateInit2() and request gzip decoding? — Mark Adler
Hi Mark, thanks for looking. I've added the code. I hope that helps. If you spot the obvious error, I'd love your insights. I'm also interested in how I should understand the encoding after the gzip header. What do those bytes mean? — PaulH

Mark Adler Mark Adler · Accepted Answer · 2021-05-04T01:51:26

That should work fine, assuming that my_alloc and my_free do what they need to do. You should verify that you are actually giving unzip() the data that you think you are giving it. The data you give it needs to start with the 1f 8b.

(Side comment: "unzip" is a lousy name for the function. It does not unzip, since zip is an entirely different format than either gzip or zlib. "gunzip" or "ungzip" would be appropriate.)

You are manually reading the bits in the deflate stream in the wrong order. The least significant bits are first. The low three bits of ec are 100, indicating a non-last dynamic block. 0 for non-last, then 10 for dynamic.

You can use infgen to disassemble a deflate stream. Its output for the 14 bytes provided is this initial portion of a dynamic block:

dynamic
count 286 27 16
code 0 5
code 2 7
code 3 7
code 4 5
code 5 5
code 6 4
code 7 4
code 8 2
code 9 3
code 10 2
code 11 4
code 12 4
code 16 7
code 17 7
lens 4 6 7 7 7 8 8 8 7 8
repeat 3
lens 10

how to zlib inflate a gzip/deflate archive

1 Answers