0
votes

I am using the zlib library (compiled from src) to deflate/inflate gzip/zlib/raw bytes. I have created a wrapper class for decompressing and compressing (Compressor/Decompresser). I have also created several test cases (GZIP, ZLib, Raw, Auto-Detect). The tests pass for Zlib/Raw/Auto-Detect(Zlib), but not for GZip (window bits of 15u | 16u).

Here is my compress function.

    std::vector<char> out(zlib->avail_in + 8);

    deflateInit2(zlib.get(), Z_DEFAULT_COMPRESSION, Z_DEFLATED, static_cast<int32_t>(mode), 8, Z_DEFAULT_STRATEGY);

    zlib->avail_out = out.size();
    zlib->next_out = reinterpret_cast<Bytef*>(out.data());

    deflate(zlib.get(), Z_FINISH);

    out.resize(zlib->total_out + 3);

    deflateEnd(zlib.get());
    return std::move(out);

And here is decompress

    uIntf multiplier = 2;
    uIntf currentSize = zlib->avail_in * (multiplier++) * 1000 /* Just to make sure enough output space(will implement loop) */;

    std::vector<char> out(currentSize);

    inflateInit2(zlib.get(), static_cast<int>(mode));

    zlib->avail_out = out.size();
    zlib->next_out = reinterpret_cast<Bytef*>(out.data());

    inflate(zlib.get(), Z_FINISH);

    out.resize(zlib->total_out);
    inflateEnd(zlib.get());
    return std::move(out);

Input is set in a different function (that is being called), that looks like this. (char* is not being deleted when compress/decompress is called)

    zlib->next_in = reinterpret_cast<Bytef*>(bytes);
    zlib->avail_in = static_cast<uIntf>(length);

I also have a mode enum

    enum class Mode : int32_t {
        AUTO = 15u | 32u, // Never used on compress
        GZIP = 15u | 16u,
        ZLIB = 15,
        RAW = -15
    };

Note: Test cases with the mode being AUTO (paired with zlib), ZLib, and RAW work. GZip fails the test case. (The test case is just a simple alphanum character array).

Also I debugged the output of the gzip decompress (after it failed) and the output is missing the last 3 characters (y, z, termination character)

Another note: The constructor of the wrapper classes look like this

    zlib->zalloc = Z_NULL;
    zlib->zfree = Z_NULL;
    zlib->opaque = Z_NULL;
    zlib->avail_in = 0;
    zlib->next_in = Z_NULL;
1

1 Answers

1
votes

First off, a bunch of scattered code fragments with no context makes it impossible to see what's happening. See How to create a Minimal, Reproducible Example for how to provide a decent example.

Second, you are not saying what is returning Z_BUF_ERROR. There aren't even any places in your code fragments where you retain the return values of deflate() or inflate(), so it's not even possible for you to see a Z_BUF_ERROR! You need to at least do something like int ret = deflate(zlib.get(), Z_FINISH); and then check the value of ret.

Third, I cannot tell in your code fragments where or even if you set the input pointer and length. Is the length set to zero before the inits? Or is it set to the data? Or is the data pointer and length set after the inits? See the MRE link above.

Fourth, we don't have the example data that you're using. So we cannot reproduce the error. Again, see the MRE link.

Ok, so making a stab in the dark here, I will guess that deflate() is returning the error. Then the problem is likely that you have not provided enough output space, and you have asked for Z_FINISH, which is telling deflate() you have provided enough output space. In that case, deflate() returning Z_BUF_ERROR means that you didn't. Compression can expand the data if it is not compressible, and gzip adds more header and trailer information than zlib. Your + 8 is inadequate to account for those two things. A zlib header and trailer is six bytes, whereas a gzip header and trailer is at least 18 bytes. The expansion is a multiplier on the input, adding some part of a percent, where you have no multiplier on the length at all.

zlib provides a function for just this purpose, deflateBound(). You would call that after deflateInit() with the size of your input, and it will return the maximum size of the compressed output.

However it is better to call deflate() multiple times in a loop. For most practical applications, it is necessary to call inflate() multiple times in a loop. This is seen in your comment, as well as in your attempt (also inadequate) to account for the possible size of the inflated data by multiplying by a thousand.

You can find a heavily commented example of how to use the zlib functions properly, with loops, at zlib Usage Example.