2
votes

I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream. I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed. The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes). The snippet of the decompression process is below:

int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
    msi.Write(compressedSection, 0, compressedSection.Length);
    msi.Position = 0;
    int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer

    // SharpZipLib GZip method called
    using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
    {
        using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
        {
            byte[] buffer = new byte[uncompressedIntSize];
            decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read         
            outputStream.Write(buffer, 0, uncompressedIntSize);
            using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
            {
                fs.Write(buffer, 0, buffer.Length);
                fs.Close();
            }
            outputStream.Close();
        }
        decompressStream.Close();

So in this snippet:

1) The compressed section is passed in, ready to be decompressed.

2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. The header is removed earlier as it is not part of the compressed GZIP file.

3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well).

4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; it takes the expected decompressed file size as the argument (capacity).

5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize).

6) int GzipInputStream decompressStream uses .Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data. The rest are zeroes.

It looks like the output of .Read is being throttled to 256 bytes but I'm not sure where. Is there something I've missed with the Streams, or is this a limitation with .Read?

2
any time you call stream.Read without reading the result: that's almost certainly the error; Read does not guarantee to read uncompressedIntSize bytes; it is only required to read 0 bytes if the stream is empty, or any number <= uncompressedIntSize otherwise; it could read 1 byte each time; you need to either loop, or call decompressStream.CopyTo(outputStream); instead?Marc Gravell

2 Answers

2
votes

You need to loop when reading from a stream; the lazy way is probably:

decompressStream.CopyTo(outputStream);

(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream)

A more manual version (that respects an imposed length limit) would be:

const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
    int remaining = uncompressedIntSize, bytesRead;
    while (remaining > 0 && // more to do, and making progress
        (bytesRead = decompressStream.Read(
        buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
    {
        outputStream.Write(buffer, 0, bytesRead);
        remaining -= bytesRead;
    }
    if (remaining != 0) throw new EndOfStreamException();
}
finally
{
    ArrayPool<byte>.Shared.Return(buffer);
}
0
votes

The issue turned out to be an oversight I'd made earlier in the posted code:

The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them. When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:

brg.BaseStream.Seek(6, SeekOrigin.Begin);

Instead of:

brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);

This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header. As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream. But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.

There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the .Read or something in the stream retaining data from the previous pass. Sorry for the hassle.