0
votes

The current content of this google docs page is:

alt text http://www.deviantsart.com/upload/i9k01q.png

However, when reading this page with the following PHP fopen() script, I get an older, cached version:

alt text
(source: deviantsart.com)

I've tried two solutions proposed in this question (a random attribute and using POST) and I also tried clearstatcache() but I always get the cached version of the web page.

What do I have to change in the following script so that fopen() returns the current version of the web page?

<?php
$url = 'http://docs.google.com/View?id=dc7gj86r_32g68627ff&amp;rand=' . getRandomDigits(10);

echo $url . '<hr/>';
echo loadFile($url);

function loadFile($sFilename) {
    clearstatcache();
    if (floatval(phpversion()) >= 4.3) {
        $sData = file_get_contents($sFilename);
    } else {
        if (!file_exists($sFilename)) return -3;

        $opts = array('http' =>
          array(
            'method'  => 'POST',
            'content'=>''
          )
        );
        $context  = stream_context_create($opts);                

        $rHandle = fopen($sFilename, 'r', $context);
        if (!$rHandle) return -2;

        $sData = '';
        while(!feof($rHandle))
            $sData .= fread($rHandle, filesize($sFilename));
        fclose($rHandle);
    }
    return $sData;
}

function getRandomDigits($numberOfDigits) {
 $r = "";
 for($i=1; $i<=$numberOfDigits; $i++) {
  $nr=rand(0,9);
  $r .=  $nr;
 }
 return $r;
}

?>

ADDED: taking out the $opts and $context gives me a cached page as well:

function loadFile($sFilename) {
    if (floatval(phpversion()) >= 4.3) {
        $sData = file_get_contents($sFilename);
    } else {
        if (!file_exists($sFilename)) return -3;              

        $rHandle = fopen($sFilename, 'r');
        if (!$rHandle) return -2;

        $sData = '';
        while(!feof($rHandle))
            $sData .= fread($rHandle, filesize($sFilename));
        fclose($rHandle);
    }
    return $sData;
}

ADDED: this curl script which sends a Firefox user agent returns the cached version as well:

<?php
$url = 'http://docs.google.com/View?id=dc7gj86r_32g68627ff';
//$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
$user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)';
$ch = curl_init();
//curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/cookie");
//curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie");
curl_setopt($ch, CURLOPT_URL, $url ); 
curl_setopt($ch, CURLOPT_FAILONERROR, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
echo curl_exec($ch);
?>
3

3 Answers

1
votes

I also get this:

Test One;http://docs.google.com/View?id=dc7gj86r_30dzgzbjch
Test Two;http://docs.google.com/View?id=dc7gj86r_31dbssfrzx

The "caching" must be being done at Google Docs or, more probably, it's your fault (wrong URL?).


Response headers:

Set-Cookie: ******
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Sun, 02 May 2010 03:30:29 GMT
X-Frame-Options: ALLOWALL
Content-Encoding: gzip
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Length: 3987
Server: GSE
2
votes

I have successfully reproduced this. Google IS caching when you aren't the owner of the published web document. If you log out, it gave me the old version.

After I unpublished it and republished it, I could no longer reproduce the issue. Ensure that you keep publishing the document under the "Share as Web Page" when you update it.

Just to make sure, check in a browser that isn't logged in (or your script). If it doesn't update: unpublish and publish again. It did not change the URL for me.

1
votes

Try making sure your browser isn't caching the information. I'm not seeing any cache headers or anything. Your webserver might be adding something, or your browser might be assuming it's cached. Try including the time with the output so you can make sure the request was generated at the correct time.

I used fopen years ago for data that updated quite often. Never ran into a cache problem with fopen. In fact, I would be disappointed if the PHP developers added a web cache to fopen as it would ruin most of the valid use-cases AND it isn't in their documentation. I'll go and look at the PHP source code just to make sure.

Can you update the document so that some of us may try reproducing?