I have a function for calculating the md5 hashes of all the files in a drive. A hash is calculated but it's different from the hash I got using other programs or online services that are designed for that.
def md5_files(path, blocksize = 2**20):
hasher = hashlib.md5()
hashes = {}
for root, dirs, files in os.walk(path):
for file in files:
file_path = os.path.join(root, file)
print(file_path)
with open(file_path, "rb") as f:
data = f.read(blocksize)
if not data:
break
hasher.update(data)
hashes[file_path] = hasher.hexdigest()
return hashes
the path
provided is the drive letter, for example "K:\" then I navigate through the files and I open the file for binary read. I read chunks of data of the size specified in blocksize
. Then I store the filename and md5 hash of every file in a dictionary called hashes
. The code looks okay, I also checked other questions on Stack Overflow. I don't know why the generated md5 hash is wrong.
hasher
for each file. – Aran-Fey