0
votes

This is my code:

$url = 'http://www.douban.com/';

$url = str_replace(" ", "%20", $url);
$TheURL_header = substr($url, 0, 7);
if ($TheURL_header == "http://") {
    $pos = strpos($url, "/", 7);
    if ($pos) {
        $host = substr($url, 7, $pos - 7);
    } else {
        $host = substr($url, 7);
    }
    $referer = "http://" . $host . "/";
} else if ($TheURL_header == "https:/") {
    $pos = strpos($url, "/", 8);
    if ($pos) {
        $host = substr($url, 8, $pos - 8);
    } else {
        $host = substr($url, 8);
    }
    $referer = "https://" . $host . "/";
} else {
    $pos = strpos($url, "/");
    if ($pos) {
        $host = substr($url, 0, $pos);
    } else {
        $host = substr($url, 0);
    }
    $url = "http://" . $url;
    $referer = "http://" . $host . "/";
}

$c = curl_init();
$curl_header = array(
    'Accept: */*',
    'Referer: ' . $referer,
    'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.2) Gecko/20090803 Ubuntu/9.04 (jaunty) Shiretoko/3.5.2',
    'Host: ' . $host,
    'Connection: Keep-Alive');
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_HTTPHEADER, $curl_header);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($c, CURLOPT_TIMEOUT, 30);
curl_setopt($c, CURLOPT_HEADER, 0);

$res = curl_exec($c);

echo $res;

It works fine when I set the $url = 'http://www.google.com', but if I change the $url,like www.aoguejewellery.com and some other urls, I always got a 403 error.

1
403 is not a Bad Request error but a Forbidden error. The status code for Bad Request is 400.BoltClock♦
The site may be using cookies to track the session. I didn't put this in as an answer because i have more experience using java to grab web pages. But I had a similar problem and I had to use the apache HttpClient java libraries to fully emulate a browser and deal with the cookie management. Just something to look into.Matt Phillips
drop all that substr/strpos nonsense and use urlencode() :phanshenrik
Not able to reproduce - works. Maybe your IP is blocked or something. Btw, add line in curl settings curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1); for enabling redirects sniffing in curl.Agnius Vasiliauskas

1 Answers

0
votes

Try removing curl_setopt($c, CURLOPT_CUSTOMREQUEST, 'GET'); and then try again.