15
votes

I'm using php's file_get_contents() function to do a HTTP request. To save bandwidth I decided to add the "Accept-Encoding: gzip" header using stream_context_create().

Obviously, file_get_contents() outputs a gzip encoded string so I'm using gzuncompress() to decode the encoded string but I get an error with data passed as argument.

[...] PHP Warning: gzuncompress(): data error in /path/to/phpscript.php on line 26

I know there is another function able to decompress gzipped data gzdecode() but it isn't included in my PHP version (maybe it is only available on SVN).

I know that cUrl decodes gzip stream on the fly (without any problem) but someone suggested me to use file_get_contents() instead of cUrl.

Do you know any other way to decompress gzipped data in PHP or why gzuncompress() outputs a Warning? It is absurd that gzuncompress() doesn't work as expected.

Notes: The problem is certainly about PHP: the HTTP request is made to Tumblr API that give a well-encoded response.

3
Do you know why they suggested to use file_get_contents instead of cUrl? - Jonathan
No, I don't know, they said "it's better". I can go back to cUrl but I'm anyway curious about gzuncompress() issue. - Fabio Buda
Is it because the data is base64 encoded too? - Paul Bain
Are you sure file_get_contents isn't doing the decompression for you? It's a long shot, I know... Try dumping the contents of the file and checking for the gzip magic number 0x1f8b at the start of the file. - Jonathan
No, even added base64_decode() I get the same error. - Fabio Buda

3 Answers

13
votes

gzuncompress won't work for the gzip encoding. It's the decompression function for the .Z archives.

The manual lists a few workarounds for the missing gzdecode()#82930, or just use the one from upgradephp, or the gzopen temp file workaround.

Another option would be forcing the deflate encoding with the Accept-Encoding: header and then using gzinflate() for decompression.

32
votes

Found this working for me: http://www.php.net/manual/en/function.gzdecode.php#106397

Optionally try: http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping

if ( ! function_exists('gzdecode'))
{
    /**
     * Decode gz coded data
     * 
     * http://php.net/manual/en/function.gzdecode.php
     * 
     * Alternative: http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping
     * 
     * @param string $data gzencoded data
     * @return string inflated data
     */
    function gzdecode($data) 
    {
        // strip header and footer and inflate

        return gzinflate(substr($data, 10, -8));
    }
}
1
votes

Before decomress data you need to assemble it. So if header contains

Transfer-Encoding: chunked

you need to unchank it.

function http_unchunk($data) {
    $res=[];
    $p=0; $n=strlen($data);
    while($p<$n) {
        if (preg_match("/^([0-9A-Fa-f]+)\r\n/",substr($data,$p,18),$m)) {
            $sz=hexdec($m[1]); $p+=strlen($m[0]);
            $res[]=substr($data,$p,$sz); $p+=$sz+2;
        } else {
            break;
        }
    }
    return implode('',$res);
}

if Content-Encoding is gzip or x-gzip or x-compress use gzdecode if Content-Encoding is deflate use gzdeflate

...
if ($chunked) $body=http_unchunk($body);
if ($gzip) $body=gzdecode($body);
if ($deflate) $body=gzdeflate($body);
...