HMAC-SHA256 with AES-256 in CBC mode

Question

I recently came across the following code sample for encrypting a file with AES-256 CBC with a SHA-256 HMAC for authentication and validation:

aes_key, hmac_key = self.keys
# create a PKCS#7 pad to get us to `len(data) % 16 == 0`
pad_length = 16 - len(data) % 16
data = data + (pad_length * chr(pad_length))
# get IV
iv = os.urandom(16)
# create cipher
cipher = AES.new(aes_key, AES.MODE_CBC, iv)
data = iv + cipher.encrypt(data)
sig = hmac.new(hmac_key, data, hashlib.sha256).digest()
# return the encrypted data (iv, followed by encrypted data, followed by hmac sig):
return data + sig

Since, in my case, I'm encrypting much more than a string, rather a fairly large file, I modified the code to do the following:

aes_key, hmac_key = self.keys
iv = os.urandom(16)
cipher = AES.new(aes_key, AES.MODE_CBC, iv)

with open('input.file', 'rb') as infile:
    with open('output.file', 'wb') as outfile:
        # write the iv to the file:
        outfile.write(iv)

        # start the loop
        end_of_line = True

        while True:
            input_chunk = infile.read(64 * 1024)

            if len(input_chunk) == 0:
                # we have reached the end of the input file and it matches `% 16 == 0`
                # so pad it with 16 bytes of PKCS#7 padding:
                end_of_line = True
                input_chunk += 16 * chr(16)
            elif len(input_chunk) % 16 > 0:
                # we have reached the end of the input file and it doesn't match `% 16 == 0`
                # pad it by the remainder of bytes in PKCS#7:
                end_of_line = True
                input_chunk_remainder = 16 - (len(input_chunk) & 16)
                input_chunk += input_chunk_remainder * chr(input_chunk_remainder)

            # write out encrypted data and an HMAC of the block
            outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data, 
                    hashlib.sha256).digest())

            if end_of_line:
                break

Simply put, this reads an input file in blocks of 64KB at a time and encrypts these blocks, generating a HMAC using SHA-256 of the encrypted data, and appending that HMAC after each block. Decryption will happen by reading in 64KB + 32B chunks and calculating the HMAC of the first 64KB and comparing it against the SHA-256 sum occupying the last 32 bytes in the chunk.

Is this the right way to use an HMAC? Does it ensure security and authentication that the data was unmodified and decrypted with the right key?

FYI, the AES and HMAC keys are both derived from the same passphrase which is generated by running the input text through SHA-512, then through bcrypt, then through SHA-512 again. The output from the final SHA-512 is then split into two chunks, one used for the AES password and the other used for the HMAC.

SquareRootOfTwentyThree SquareRootOfTwentyThree · Accepted Answer · 2013-02-22T23:37:49

Yes, there are 2 security problems.

But first, I assume that with this statement at the end:

# write out encrypted data and an HMAC of the block
outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data, hashlib.sha256).digest())

you actually meant:

# write out encrypted data and an HMAC of the block
data = cipher.encrypt(input_chunk)
outfile.write(data + hmac.new(hmac_key, data, hashlib.sha256).digest())

Because data is not defined anywhere.

The 1st security problem is that you are authenticating each piece independently of the others, but not the composition. In other words, the attacker can reshuffle, duplicate, or remove any of the chunks and the receiver will not notice.

A more secure approach is to have one instance of HMAC only, pass all the encrypted data to it via the update method, and output one digest, at the very end.

Alternatively, if you want to enable the receiver to detect tampering before receiving the whole file, you can output the intermediate MAC for each piece. In fact, a call to digest does not change the state of the HMAC; you can keep calling update afterwards.

The 2nd security problem is that you don't use salt for your key derivation (I say that because you don't send it). Apart from password cracking, if you encrypt more than 2 files using the same password the attacker will also be able to freely mix chunks taken by either encrypted file - because the HMAC key is the same. Solution: use salt.

One last minor thing: infile.read(64 * 1024) may return less than 64*1024 bytes, but that does not mean you reached the end of the file.

HMAC-SHA256 with AES-256 in CBC mode

2 Answers