Original file bigger than GZip decompressed file

Question

So I'm trying to compress an array of bytes (via a Stream). ExtendedStream is just a class I made that interfaces with a base stream (by default MemoryStream). If I take my original data, compress it, decompress it, then compare the size of the decompressed data to the size of the original data (before compression) it turns out that it's smaller than the original.

Original length: 51695, Compressed length: 26014, Decompressed length: 48685.

I'm storing tiles (7 bytes). Compressing using the GZip classes provided in the System.IO.Compression namespace.

    public static ExtendedStream GZipDecompress(ExtendedStream stream)
    {
        ExtendedStream outStream = new ExtendedStream();
        GZipStream decompressStream = new GZipStream(stream.BaseStream, CompressionMode.Decompress, true);
        int b = -1;

        while ((b = decompressStream.ReadByte()) != -1)
        {
            outStream.WriteByte((Byte)b);
        }

        outStream.Seek(0, SeekOrigin.Begin);
        return outStream;
    }

    public static ExtendedStream GZipCompress(ExtendedStream stream)
    {
        ExtendedStream outStream = new ExtendedStream(); // base stream is a memorystream
        GZipStream compressStream = new GZipStream(outStream.BaseStream, CompressionMode.Compress, true);
        compressStream.Write(stream.ToArray(), 0, (int)stream.Length);
        compressStream.Flush();
        outStream.Seek(0, SeekOrigin.Begin);
        return outStream;
    }

Hope that's enough information.

First, you should be calling .Dispose on your GZipStream objects when you are done. Also on write, you need to call .Flush on outStream after you dispose of your GZipStream. Third, if this is .NET4, use the .CopyTo method on your input stream's underlying member to copy the data from the input to the output (instead of using .ToArray, which will not scale as your streams get large.) — Joe
No, you don't need to call flush on "outStream" when you're decompressing or compressing. The data is already in the stream. Flushing would do nothing anyway since it's a MemoryStream (and FYI MemoryStream overrides Flush so that it doesn't do anything). I'll keep the CopyTo method in mind but GC would just fix everything for me (sloppy design I know but this isn't design goal #1). — Yes Man
@Supah, no; Joe is right. You must must must formally close/dispose compression streams. Flush can't empty unless it knows nothing more is inbound, which only happens at close. Limitation of compression. This is absolutely critical in compression (do you think I emphasised that enough yet?). Btw: copying individual bytes is a horrible way to do this. — Marc Gravell♦

Marc Gravell Marc Gravell · Accepted Answer · 2012-02-24T17:45:29

You must close a compression stream; it can't write all the data until you do, due to block issues:

public static ExtendedStream GZipCompress(ExtendedStream stream)
{
    ExtendedStream outStream = new ExtendedStream();
    using(var compressStream = new GZipStream(outStream.BaseStream, CompressionMode.Compress, false)) {
        compressStream.Write(stream.GetBuffer(), 0, (int)stream.Length);
    }
    outStream.Seek(0, SeekOrigin.Begin);
    return outStream;
}

Note: normally I'd use CopyTo (or manual looping) to do the write, but since this is memory-based GetBuffer() is cheaper.

Your decompression can be simply:

decompressStream.WriteTo(outStream);

Throught both parts of this, the biggest problem (that has bitten you here) is not disposing your disposable objects. You must do this; it is a requirement of the API. To avoid prematurely disposing the outer-stream, pass "false" to the GZipStream constructor. For example:

using(var decompressStream = new GZipStream(stream.BaseStream, CompressionMode.Decompress, false)) {
    decompressionStream.CopyTo(outStream);
}

Original file bigger than GZip decompressed file

1 Answers