1
votes

I have a system which uploads documents, running them through GZip and storing them on the server. A bug in the code caused the GZip header to be overwritten in the files by setting the stream position to 0 before the write content was flushed, overwriting the header.

I've created a program to insert the header in the files that are missing it, but i'm unable to unzip the file as the footer no longer matches.

Is it possible to correct the file footer from this state? I had assumed that I could just add the 10 byte header to the original documents, the checksum was for the file body, and the footer was the total file length, so I could just increment the counter by 10.

All help appreciated, as these files are quite important.

Thanks

UPDATE:

Following the advice of Iain, i've tried decompressing the body of the zip file using the Deflate algorythm. The code is as follows:

    private void DecompressStream(Stream input, Stream output)
    {
        using (var memStream = new MemoryStream())
        {
            //Copy everything but last 8 bytes
            CopyStream(input, memStream, 8);
            memStream.Position = 0;

            using (var deflate = new DeflateStream(memStream, CompressionMode.Decompress))
            {
                //Now try to decompress the stream
                deflate.CopyTo(output);
            }
        }
    }

    private void CopyStream(Stream input, Stream output, int dropBytes)
    {
        byte[] buffer = new byte[32768];
        int read;
        while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
        {
            var lastMessagePart = (input.Position == input.Length);

            //Might go negative - to fix.
            if (lastMessagePart)
                read -= dropBytes;

            output.Write(buffer, 0, read);
        }
    }

There is an exception at the deflate.CopyTo(output) line, which fails with: Unknown block type. Stream might be corrupted.

UPDATE 2:

Alright! I've fixed it. The loss of the first 10 bytes didn't actually change the content of the stream or the footer. So just adding it back in again caused it to work. There was a small bug in my memory stream logic that meant it didn't work when I fed it straight into the GZip decompression, but when I output it to disk and read it back in, it worked fine.

Thanks for all the help guys, you got me looking at it from a different angle.

UPDATE 3:

Alright, I haven't fixed it. I've got a fix for small files, but larger ones seem to be causing problems still. I'm going to continue investigation and maybe i'll be able to figure out a solution.

1
How do you know you only overwrote the gzip header? You should not be modifying the gzip trailer. The length there is not the length of the gzip file, but rather the length of the uncompressed data.Mark Adler
Please provide the first 100 bytes of an example of an overwritten gzip file in hex.Mark Adler

1 Answers

2
votes

Certainly you should be able to decompress without the footer, as long as you ignore the CRC check and length check. IIRC, the gzip footer size is the uncompressed size mod (2^31)-1

I'm not sure you will be able to rebuild a correct footer without decompressing, as you will need the original data to do the CRC check. You shouldn't need to hold all the decompressed data in memory.

In .Net, you can use the deflate stream instead of gzip, then you don't need the header or footer, which you can rebuild. If there is any data corruption, you will have no way of knowing.

Deflate Stream (MSDN)