0
votes

Downloading an image using cURL

https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg

when saving this image manually from the browser to the local pc, the size shown by the system is 139,880 bytes

When downloading it using cURL, the file seems to be damaged and does not get considered as a valid image

its size, when downloaded using cURL, is 139,845 which is lower than the size when downloading it manually

digging the issue further, found that the server is returning the content length in the response headers as

content-length: 139845

This length is identical to what cURL downloaded, so I suspected that cURL closes the transfer once reached the alleged (possibly wrong) length by the server

Is there any way to make cURL download the file completely even if the content-length header is wrong

Used code:

//curl ini
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,20);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.bing.com/');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8');
curl_setopt($ch, CURLOPT_MAXREDIRS, 5); // Good leeway for redirections.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // Many login forms redirect at least once.
curl_setopt($ch, CURLOPT_COOKIEJAR , "cookie.txt");

//curl get
$x='error';
$url='https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg';
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt($ch, CURLOPT_URL, trim($url));
$exec=curl_exec($ch);
$x=curl_error($ch);

$fp = fopen('test.jpg','x');
fwrite($fp, $exec);
fclose($fp);
2
What's the strlen() of the received data? Consider using the b flag when opening the output file, too!Ulrich Eckhardt
strlen returns 139845Atef
I've tried with curl and wget from the command line (passing the flat to ignore the content length header) and both result in a file that is 139,845 bytes long. I'm starting to wonder if the gzip compression is the problem.Sean Bright
@SeanBright indeed it is. use the --compressed flag with curl, and it'll work.hanshenrik

2 Answers

2
votes

the server has a bugged implementation of Accept-Encoding compressed transfer mechanism. the response is ALWAYS gzip-compressed, but won't tell the client that it's gzip-compressed unless the client has the Accept-Encoding: gzip header in the request. when the server doesn't tell the client that it's gzipped, the client won't gzip-decompress it before saving it, thus your corrupted download. tell curl to offer gzip compression by setting CURLOPT_ENCODING,

curl_setopt($ch,CURLOPT_ENCODING,'gzip');

, then the server will tell curl that it's gzip-compressed, and curl will decompress it for you, before giving it to PHP.

you should probably tell the server admin about this, it's a serious bug in his web server, corrupting downloads.

0
votes

libcurl has an option for that called CURLOPT_IGNORE_CONTENT_LENGTH, unfortunately this is not natively supported in php, but you can trick php into setting the option anyway, by using the correct magic number (which, at least on my system is 136),

if(!defined('CURLOPT_IGNORE_CONTENT_LENGTH')){
    define('CURLOPT_IGNORE_CONTENT_LENGTH',136);
}
if(!curl_setopt($ch,CURLOPT_IGNORE_CONTENT_LENGTH,1)){
    throw new \RuntimeException('failed to set CURLOPT_IGNORE_CONTENT_LENGTH! - '.curl_errno($ch).': '.curl_error($ch));
}

you can find the correct number for your system by compiling and running the following c++ code:

#include <iostream>
#include <curl/curl.h>
int main(){
std::cout << CURLOPT_IGNORE_CONTENT_LENGTH << std::endl;
}
  • but it's probably 136. lastly, protip, file_get_contents ignore the content-length header altogether, and just keeps downloading until the server closes the connection (which is potentially much slower than curl) - also, you should probably contact the server operator and let him know, something's wrong/bugged with his server.