pbkdf2 and hash comparison

Question

I use mitsuhiko's implementation of pbkdf2 for password hashing:

def pbkdf2_bin(data, salt, iterations=1000, keylen=24, hashfunc=None):
    """Returns a binary digest for the PBKDF2 hash algorithm of `data`
    with the given `salt`.  It iterates `iterations` time and produces a
    key of `keylen` bytes.  By default SHA-1 is used as hash function,
    a different hashlib `hashfunc` can be provided.
    """
    hashfunc = hashfunc or hashlib.sha1
    mac = hmac.new(data, None, hashfunc)
    def _pseudorandom(x, mac=mac):
        h = mac.copy()
        h.update(x)
        return map(ord, h.digest())
    buf = []
    for block in xrange(1, -(-keylen // mac.digest_size) + 1):
        rv = u = _pseudorandom(salt + _pack_int(block))
        for i in xrange(iterations - 1):
            u = _pseudorandom(''.join(map(chr, u)))
            rv = starmap(xor, izip(rv, u))
        buf.extend(rv)
    return ''.join(map(chr, buf))[:keylen]

This function returns binary digest which is then encoded in base64 and saved to database. Also that base64 string is set as a cookie when a user logs in.

This function is used for password hashes comparison:

def comparePasswords(password1, password2):
    if len(password1) != len(password2):
        return False
    diff = 0
    for x, y in izip(password1, password2):
        diff |= ord(x) ^ ord(y)
    return diff == 0

I wonder if there is any difference in comparison of binary hashes and base64 strings in terms of security? For example when a user logs in, I calculate binary digest of submitted password, decode base64 string from the database and then compare two binary hashes, but in case the user has a cookie with base64 string, I directly compare it with the a string from the database.

The second question is about salt:

os.urandom returns binary string, but before it is used in hash generation I also encode it in base64. Is there any reason why I shouldn't do this and use salt in binary form?

To any one reading this: this is an old question and PBKDF2 is no longer the recommended password hashing algorithm (although it is far better than many broken schemes people use in practice). Nowadays this title belong to Argon2i, the winner of the password hashing competition. — Erwan Legrand

Eli Collins Eli Collins · Accepted Answer · 2012-08-06T17:01:08

To answer question 1: There's no major security difference when comparing bytes vs comparing a base64 encoded string... you're just comparing n or n*4/3 elements. The runtime will be 4/3 longer using base64, but the amount of time is still trivial :)

That said, there was a python developer discussion regarding a similar "constant time" comparison function, and they hit upon a few VM-level gotchas - if your input is a unicode string rather than bytes, and especially if the unicode contains non-ASCII characters, there may still be some subtle timing attacks (orders of magnitude less than the short-circuit-equality attack). Because of that, I'd stick to bytes if possible (whether binary data or ASCII-encoded base64 data). However, I wouldn't worry too much in the case of PBKDF2, since the timing attack that comparison function was designed to defeat applies more to HMAC signing, rather than password hash verification... but better to be safe than sorry.

To answer question 2: For insecure constructions such as md5(salt+password), encoding the salt first would allow an attacker to use existing ASCII md5 tables to attack the entire digest, where a raw binary salt would make such tables useless. However, PBKDF2-HMAC does enough mangling that the only thing which matters is that the salt contains n bits of entropy - whether it's in the form of n/8 raw bytes, or n/6 base64 chars doesn't affect security.

Other notes: I just wanted to add a few other points relating to what you posted...

For security purposes I'd strongly recommend using SHA256/512 instead of SHA1 as the PBKDF2-HMAC hash function, and >= 10,000 rounds (as of 2012), for security.
The idea of sending the digest over in a cookie (even w/o the salt) strikes me as potentially insecure... if someone steals that cookie (e.g. cross-site scripting attack), they could potentially log in as the user (though I don't know the rest your application's security setup). Using a session layer with a randomly generated session id (e.g. Beaker) might be a good alternative.
I'd recommend using the Passlib PBKDF2 and consteq implementations, it's PBKDF2 routine is about 5x faster than mitsuhiko's, and can take advantage of M2Crypto if present. (disclaimer: I'm the author of Passlib). It's also got a ready-made pbkdf2-sha256 password hashing function, though that won't be quite as much use if you're sending the digest out in the cookie.

pbkdf2 and hash comparison

1 Answers