60
votes

When requesting a page with Gzip compression I am getting a lot of the following errors:

System.IO.InvalidDataException: The CRC in GZip footer does not match the CRC calculated from the decompressed data

I am using native GZipStream to decompress and am looking at addressing this. With that in mind is there a work around for addressing this or another GZip library (free?) which will handle this issue properly?

I am verifying the webResponse ContentEncoding is GZIP

Update 5/11 A simplified snippit

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;

        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}
6
Is it for a specific site, or is this happening from responses everywhere? If its only one site, it could be that the problem lies on the other side.MichaelICE
Note also that "deflate", according to the HTTP spec, is really "zlib" (which wraps deflate), and not deflate at all (it's a misnomer). Because of this confusion, though, some servers will send deflate, and other zlib, and clients need to support both (by heuristic guess) just in case. Yuck.Cameron

6 Answers

135
votes

What about the webrequest AutomaticDecompression Property available since .net 2? Simply add:

webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

It also adds the gzip,deflate to the accept encoding header.

See http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecompression.aspx

4
votes

For .NET Core things are a little more involved. A GZipStream is needed as there isn't a property (as of writing) for AutomaticCompression. See my answer here: https://stackoverflow.com/a/44508724/2421277

Code from answer:

var req = WebRequest.CreateHttp(uri);

/*
 * Headers
 */
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";

/*
 * Execute
 */
try
{
    using (var resp = await req.GetResponseAsync())
    {
        using (var str = resp.GetResponseStream())
        using (var gsr = new GZipStream(str, CompressionMode.Decompress))
        using (var sr = new StreamReader(gsr))

        {
            string s = await sr.ReadToEndAsync();  
        }
    }
}
catch (WebException ex)
{
    using (HttpWebResponse response = (HttpWebResponse)ex.Response)
    {
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            string respStr = sr.ReadToEnd();
            int statusCode = (int)response.StatusCode;

            string errorMsh = $"Request ({url}) failed ({statusCode}) on, with error: {respStr}";
        }
    }
}
2
votes

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.

2
votes

I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx

1
votes

See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.

-2
votes

The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

The disadvantage of using the GZipStream class over a 3rd party product is that it has limited capabilities. One of the limitations is that you cannot give a name to the file that you place in the archive. When GZipStream compresses the file into a ZIP archive, it takes the sequence of bytes from that file and uses compression algorithms that create a smaller sequence of bytes. The new sequence of bytes is put into the new ZIP file. When you open the ZIP file you will open the archived file itself; most popular ZIP extractors (WinZip, WinRar, etc.) will show you the content of the ZIP as a file that has the same as the archive itself.


EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.


For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).