2
votes

I've written a script that transfers local files into a folder structure on a remote FTP server with PHP. I'm currently using ftp_connect() to connect to the remote server and ftp_put() to transfer the file, in this case, a CSV file.

My question is, how would one verify that a file's contents (on the remote FTP server) are not a duplicate of the local file's contents? Is there any way to parse the contents of the remote file in question, as well as a local version and then compare them using a PHP function?

I have tried comparing the filesizes of the local file using filesize() and the remote file using ftp_size(), respectively. However, even with different data, but the same number of characters it generates a false positive for duplication as the file-sizes are the same number of bytes.

Please note, the FTP in question is not under my control, so I can't put any scripts on the remote server.

Update

Thanks to both Mark and gafreax, here is the final working code:

$temp_local_file = fopen($conf['absolute_installation_path'] . 'data/temp/ftp_temp.csv', 'w');
if ( ftp_fget($connection, $temp_local_file, $remote_filename, FTP_ASCII) ) {
    $temp_local_stream = file_get_contents($conf['absolute_installation_path'] . 'data/temp/ftp_temp.csv');
    $local_stream = file_get_contents($conf['absolute_installation_path'] . $local_filename);
    $temp_hash = md5($temp_local_stream);
    $local_hash = md5($local_stream);
    if($temp_hash !== $local_hash) {
        $remote_file_duplicate = FALSE;
    } else {
        $remote_file_duplicate = TRUE;
    }                       
}
2
You ask "...is there any way to parse the contents of the remote file in question". You could download the file to the local disk and MD5-checksum them, but what would be the point? You might as well just upload your local file rather than download the remote one and check they differ before having to upload the local one anyway. So, I am guessing you don't want to just upload the local file because it is too big, correct? - Mark Setchell
Thanks Mark, no the local file isn't too big, but I was trying to avoid unnecessarily uploading a duplicate file, as the script will be running on a frequent basis, and the modified times on the remote server are important. If the data hasn't changed, then why overwrite the remote file? - Paul Macey

2 Answers

1
votes

You can use hashing function like md5 and check against two generated md5 if they match.

For example:

 $a = file_get_contents('a_local');
 $b = file_get_contents('b_local');
 $a_hash = md5($a);
 $b_hash = md5($b);
 if($a_hash !== $b_hash) 
    echo "File differ";
 else 
    echo "File are the same";

The md5 function is useful to avoid problem on reading strange data on file

1
votes

You could also compare the last modified time of each file. You'd upload the local file only if it is more recent than the remote one. See filemtime and ftp_mdtm. Both of those return a UNIX timestamp you can easily compare. This is faster than getting the file contents and calculating a hash.