3
votes

I recently came across the following code sample for encrypting a file with AES-256 CBC with a SHA-256 HMAC for authentication and validation:

aes_key, hmac_key = self.keys
# create a PKCS#7 pad to get us to `len(data) % 16 == 0`
pad_length = 16 - len(data) % 16
data = data + (pad_length * chr(pad_length))
# get IV
iv = os.urandom(16)
# create cipher
cipher = AES.new(aes_key, AES.MODE_CBC, iv)
data = iv + cipher.encrypt(data)
sig = hmac.new(hmac_key, data, hashlib.sha256).digest()
# return the encrypted data (iv, followed by encrypted data, followed by hmac sig):
return data + sig

Since, in my case, I'm encrypting much more than a string, rather a fairly large file, I modified the code to do the following:

aes_key, hmac_key = self.keys
iv = os.urandom(16)
cipher = AES.new(aes_key, AES.MODE_CBC, iv)

with open('input.file', 'rb') as infile:
    with open('output.file', 'wb') as outfile:
        # write the iv to the file:
        outfile.write(iv)

        # start the loop
        end_of_line = True

        while True:
            input_chunk = infile.read(64 * 1024)

            if len(input_chunk) == 0:
                # we have reached the end of the input file and it matches `% 16 == 0`
                # so pad it with 16 bytes of PKCS#7 padding:
                end_of_line = True
                input_chunk += 16 * chr(16)
            elif len(input_chunk) % 16 > 0:
                # we have reached the end of the input file and it doesn't match `% 16 == 0`
                # pad it by the remainder of bytes in PKCS#7:
                end_of_line = True
                input_chunk_remainder = 16 - (len(input_chunk) & 16)
                input_chunk += input_chunk_remainder * chr(input_chunk_remainder)

            # write out encrypted data and an HMAC of the block
            outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data, 
                    hashlib.sha256).digest())

            if end_of_line:
                break

Simply put, this reads an input file in blocks of 64KB at a time and encrypts these blocks, generating a HMAC using SHA-256 of the encrypted data, and appending that HMAC after each block. Decryption will happen by reading in 64KB + 32B chunks and calculating the HMAC of the first 64KB and comparing it against the SHA-256 sum occupying the last 32 bytes in the chunk.

Is this the right way to use an HMAC? Does it ensure security and authentication that the data was unmodified and decrypted with the right key?

FYI, the AES and HMAC keys are both derived from the same passphrase which is generated by running the input text through SHA-512, then through bcrypt, then through SHA-512 again. The output from the final SHA-512 is then split into two chunks, one used for the AES password and the other used for the HMAC.

2

2 Answers

6
votes

Yes, there are 2 security problems.

But first, I assume that with this statement at the end:

# write out encrypted data and an HMAC of the block
outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data, hashlib.sha256).digest())

you actually meant:

# write out encrypted data and an HMAC of the block
data = cipher.encrypt(input_chunk)
outfile.write(data + hmac.new(hmac_key, data, hashlib.sha256).digest())

Because data is not defined anywhere.

The 1st security problem is that you are authenticating each piece independently of the others, but not the composition. In other words, the attacker can reshuffle, duplicate, or remove any of the chunks and the receiver will not notice.

A more secure approach is to have one instance of HMAC only, pass all the encrypted data to it via the update method, and output one digest, at the very end.

Alternatively, if you want to enable the receiver to detect tampering before receiving the whole file, you can output the intermediate MAC for each piece. In fact, a call to digest does not change the state of the HMAC; you can keep calling update afterwards.

The 2nd security problem is that you don't use salt for your key derivation (I say that because you don't send it). Apart from password cracking, if you encrypt more than 2 files using the same password the attacker will also be able to freely mix chunks taken by either encrypted file - because the HMAC key is the same. Solution: use salt.

One last minor thing: infile.read(64 * 1024) may return less than 64*1024 bytes, but that does not mean you reached the end of the file.

-2
votes

I don't think there is a security problem with what you're doing with the HMACs (not that that means there isn't a problem with the security), but I don't know the actual value in HMAC sub elements of the ciphertext gets you. Unless you want to support partial recovery of the plaintext in the event of tampering, there is not much reason to incur the overhead of HMACing 64 KB blocks, vs the full ciphertext.

From a key generation perspective, it might make more sense to use a key generated from a passphrase to encrypt two randomly generated keys, and then use the randomly generated keys to perform HMAC and AES operations. I know using the same key for both your block cipher and HMAC is bad news, but I don't know if using a key generated in the same manner is similarly bad.

At the very least, you should tweak your key derivation mechanism. bcrypt is a password hashing mechanism, not a key derivation function. You should use PBKDF2 to do key derivations.