3
votes

I need to decompress some zlib compressed files found within a game's save data. I have no access to the game's source. Each file begins with 0x789C which tells me that they are indeed compressed with zlib. However, all calls to inflate on these files fail to decompress fully and return Z_DATA_ERROR. Using zlib version 1.2.5, 1.2.8, and 1.2.11 with identical results.

Even though zlib is telling me the input data is corrupt, I'm confident that it is not since the game is able to decompress these files with no issues AND this is not isolated to a single data stream. I have hundreds of thousands of unique data streams compressed the same way and they all throw a Z_DATA_ERROR somewhere in the middle of the decompression.

Furthermore, the partially decompressed data that IS successfully decompressed, is correct. The output is exactly as expected.

Also, about 10% of the time, zlib WILL decompress the entire file, however the result is not correct. Large chunks of the decompressed data contain the same byte repeated over and over, which tells me it was a false positive.

Here's my decompression code:

//QByteArray is a Qt wrapper for a char *
QByteArray Compression::DecompressData(QByteArray data)
{
    QByteArray result;

    int ret;
    z_stream strm;
    static const int CHUNK_SIZE = 1;//set to 1 just for debugging
    char out[CHUNK_SIZE];

    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    strm.avail_in = data.size();
    strm.next_in = (Bytef*)(data.data());

    ret = inflateInit2(&strm, -15);
    if (ret != Z_OK)
    {
        qDebug() << "init error" << ret;
        return QByteArray();
    }

    do
    {
        strm.avail_out = CHUNK_SIZE;
        strm.next_out = (Bytef*)(out);

        ret = inflate(&strm, Z_NO_FLUSH);
        qDebug() << "debugging output: " << ret << QString::number(strm.total_in, 16);//This tells me which input byte caused the failure
        Q_ASSERT(ret != Z_STREAM_ERROR);

        switch (ret)
        {
        case Z_NEED_DICT:
            ret = Z_DATA_ERROR;
        case Z_DATA_ERROR:
        case Z_MEM_ERROR:
            (void)inflateEnd(&strm);
            return result;
        }

        result.append(out, CHUNK_SIZE - strm.avail_out);
    } while (strm.avail_out == 0);

    inflateEnd(&strm);
    return result;
}

Here is a pastebin of an example file's data compressed data with the 0x789C and trailing CRC removed. I can supply literally endless example files. All of them have the same issue.

Running that data through the above function will decompress the beginning of the stream correctly, but fail on input byte 0x18C. You can tell it decompressed correctly when the start of the file begins with 0x000B and the decompressed data is longer than the input data.

I wish I knew more about deflate compression to solve this problem myself. My initial thoughts are that the game has decided to use a custom version of zlib or an extra parameter needs to be given to zlib in order to decompress it correctly. I've asked around and tried many things for days, and I really need someone with knowledge on the subject to weigh in here. Thanks for your time!

1
If you really want to get to the bottom of this, it might be helpful to provide a larger sampling of savegames, and/or identify the game so other people can produce their own.mwfearnley
@mwfearnley there are thousands of these compressed files in this single savegamemrg95
Oh, OK.. but it's just the one sample stream you've pasted, right? Possibly looking at multiple streams would make it possible to find a consistent way that the data is being mangled... Also, I was wondering how you're able to verify the correctness of the partial data?mwfearnley
I'm able to verify the correctness because I know what to expect in the output. I'm very experienced regarding the save format of this specific game, however it's just this one edition of the game that has different compression. I've attempted to decompress all of the stream in batch. A very very small percentage of them do decompress with no errors, but the data is partially wrong still. I may post all the streams if that would helpmrg95

1 Answers

2
votes

The provided data is indeed an invalid deflate stream, both with distances too far back, and eight bytes of junk after the deflate stream has ended. There is nothing apparent wrong with your code.

As you noted, at offset 396 there is the first of ten distances too far back. That's where inflate stops. At offset 3472, almost at the end, there is a stored block with a length that doesn't check against its complement.

For the distances too far, you could try setting a dictionary of 32K zero bytes using inflateSetDictionary() right after inflateInit2(). Then the decompression would proceed, filling in the given locations with zeros. That may or may not be what the game is doing. There is no obvious remedy for the stored-block error.

Indeed the game author's may be deliberately messing with you or anyone trying to decompress their internal data, by having modified zlib for their own use.