I recently came across the following code sample for encrypting a file with AES-256 CBC with a SHA-256 HMAC for authentication and validation:
aes_key, hmac_key = self.keys
# create a PKCS#7 pad to get us to `len(data) % 16 == 0`
pad_length = 16 - len(data) % 16
data = data + (pad_length * chr(pad_length))
# get IV
iv = os.urandom(16)
# create cipher
cipher = AES.new(aes_key, AES.MODE_CBC, iv)
data = iv + cipher.encrypt(data)
sig = hmac.new(hmac_key, data, hashlib.sha256).digest()
# return the encrypted data (iv, followed by encrypted data, followed by hmac sig):
return data + sig
Since, in my case, I'm encrypting much more than a string, rather a fairly large file, I modified the code to do the following:
aes_key, hmac_key = self.keys
iv = os.urandom(16)
cipher = AES.new(aes_key, AES.MODE_CBC, iv)
with open('input.file', 'rb') as infile:
with open('output.file', 'wb') as outfile:
# write the iv to the file:
outfile.write(iv)
# start the loop
end_of_line = True
while True:
input_chunk = infile.read(64 * 1024)
if len(input_chunk) == 0:
# we have reached the end of the input file and it matches `% 16 == 0`
# so pad it with 16 bytes of PKCS#7 padding:
end_of_line = True
input_chunk += 16 * chr(16)
elif len(input_chunk) % 16 > 0:
# we have reached the end of the input file and it doesn't match `% 16 == 0`
# pad it by the remainder of bytes in PKCS#7:
end_of_line = True
input_chunk_remainder = 16 - (len(input_chunk) & 16)
input_chunk += input_chunk_remainder * chr(input_chunk_remainder)
# write out encrypted data and an HMAC of the block
outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data,
hashlib.sha256).digest())
if end_of_line:
break
Simply put, this reads an input file in blocks of 64KB at a time and encrypts these blocks, generating a HMAC using SHA-256 of the encrypted data, and appending that HMAC after each block. Decryption will happen by reading in 64KB + 32B chunks and calculating the HMAC of the first 64KB and comparing it against the SHA-256 sum occupying the last 32 bytes in the chunk.
Is this the right way to use an HMAC? Does it ensure security and authentication that the data was unmodified and decrypted with the right key?
FYI, the AES and HMAC keys are both derived from the same passphrase which is generated by running the input text through SHA-512, then through bcrypt, then through SHA-512 again. The output from the final SHA-512 is then split into two chunks, one used for the AES password and the other used for the HMAC.