0
votes

I wrote some php code to download a xml file sent via mail from Google Adwords to put the data into a mysql database.

One of the functions downloads the file to a webspace, but the size of the file differs from the file, I manually downloaded via chrome (1,8MB vs 1,6MB). Visually there is no difference between them.

The file which has been manually downloaded can be processed, but the file, which has been downloaded via curl can't be processed by simplexml.

Here is the code of the download function:

function downloadUrlToFile( $url, $outFileName ) {
    if ( is_file( $url ) ) {
        copy( $url, $outFileName );
    } else {
        $options = array(
            CURLOPT_FILE => fopen( $outFileName, 'w' ),
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_HEADER => false,
            CURLOPT_TIMEOUT => 28800,
            CURLOPT_URL => $url,
            CURLOPT_HTTPHEADER => array(
                'Host   hostname',
                'User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0',
                'Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Accept-Language    en-US,en;q=0.5',
                'Accept-Encoding    gzip, deflate',
                'Connection keep-alive', )

        );
        $ch = curl_init();
        curl_setopt_array( $ch, $options );
        curl_exec( $ch );
    }
}

Edit:

Here the new code which does not work, too. I added the : to the header. But now the file is empty (2kb).

function downloadUrlToFile( $url, $outFileName ) {
    if ( is_file( $url ) ) {
        copy( $url, $outFileName );
    } else {
        $options = array(
            CURLOPT_FILE => fopen( $outFileName, 'w' ),
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_HEADER => false,
            CURLOPT_TIMEOUT => 28800,
            CURLOPT_URL => $url,

            CURLOPT_HTTPHEADER => array(
                'Host: '.gethostname(),
                'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0',
                'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Accept-Language:    en-US,en;q=0.5',
                'Accept-Encoding:    gzip, deflate',
                'Connection: keep-alive', )

        );

        $ch = curl_init();
        curl_setopt_array( $ch, $options );
        curl_exec( $ch );
    }
}
1
Aren't all your HTTP headers missing the :? - Álvaro González
I changed the code with : but now the downloaded file is only 2kb - MrYeti
You say there's no visual difference between the expected and the obtained file. But surely when cropping from 1.8 MB to 2 KB something must be lost. How are you inspecting the files? - Álvaro González
There is no visual difference between the 1,8MB and the 1,6MB. The 2kb file is empty. Both files seem to be correct xml files with open and close tags and everything between, - MrYeti

1 Answers

0
votes

Now it works.

The xml files weren't the same, although I first thought so.

Google sends the xml files in english language when downloaded by a server. When downloaded manually, they are sent in the browser's language.