0
votes

I saved this piece of code as hash.py and when I produce the hash of this file it gives me a hash totally differing from the inbuilt (using KUbuntu 13.04), Now why is that so ? Aren't they both supposed to produce the same result. I also have to mention that for calculating hash value of huge files (I tested on 4.5GB iso file) with the inbuilt md5sum it at least takes 7 seconds but this python file is almost instant

""" filename: hash.py """
import sys
import hashlib
file_name = sys.argv[0]
hash_obj = hashlib.md5(file_name)
print "MD5 - "+ hash_obj.hexdigest()

Output:

meow@VikkyHacks:~/Arena/py$ python hash.py 
MD5 - d18a4085140ad0c8ee7671d8ba2065fc

Output from the inbuilt default command:

meow@VikkyHacks:~/Arena/py$ md5sum hash.py 
5299f3588cb0de6cf27930181be73e80  hash.py
2

2 Answers

2
votes

In the first case you are hashing the file name, in the second you are hashing the file's contents.

1
votes

You are extracting the file path from sys.argv[0] and compute its md5 (that is, the md5 of the path as a string). To compute the md5 of the file contents, use:

import sys
import hashlib

file_path = sys.argv[0]
with open(file_path, 'rb') as file_handle:
    file_contents = file_handle.read()
    print('MD5 - ' + hashlib.md5(file_contents).hexdigest())

EDIT

Using hashlib.md5(open(file_name, 'rb').read()) is a bad practice because it does not close the file properly.