Which cryptographic hash function should I choose?

Question

The .NET framework ships with 6 different hashing algorithms:

MD5: 16 bytes (Time to hash 500MB: 1462 ms)
SHA-1: 20 bytes (1644 ms)
SHA256: 32 bytes (5618 ms)
SHA384: 48 bytes (3839 ms)
SHA512: 64 bytes (3820 ms)
RIPEMD: 20 bytes (7066 ms)

Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.

MD5 has the advantage that it fits in the built-in Guid type; and it is the basis of the type 3 UUID. SHA-1 hash is the basis of type 5 UUID. Which makes them really easy to use for identification.

MD5 however is vulnerable to collision attacks, SHA-1 is also vulnerable but to a lesser degree.

Under what conditions should I use which hashing algorithm?

Particular questions I'm really curious to see answered are:

Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)
How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.
What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?

This is the benchmark I used:

    static void TimeAction(string description, int iterations, Action func) {
        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

    static byte[] GetRandomBytes(int count) {
        var bytes = new byte[count];
        (new Random()).NextBytes(bytes);
        return bytes;
    }


    static void Main(string[] args) {

        var md5 = new MD5CryptoServiceProvider();
        var sha1 = new SHA1CryptoServiceProvider();
        var sha256 = new SHA256CryptoServiceProvider();
        var sha384 = new SHA384CryptoServiceProvider();
        var sha512 = new SHA512CryptoServiceProvider();
        var ripemd160 = new RIPEMD160Managed();

        var source = GetRandomBytes(1000 * 1024);

        var algorithms = new Dictionary<string,HashAlgorithm>();
        algorithms["md5"] = md5;
        algorithms["sha1"] = sha1;
        algorithms["sha256"] = sha256;
        algorithms["sha384"] = sha384;
        algorithms["sha512"] = sha512;
        algorithms["ripemd160"] = ripemd160;

        foreach (var pair in algorithms) {
            Console.WriteLine("Hash Length for {0} is {1}", 
                pair.Key, 
                pair.Value.ComputeHash(source).Length);
        }

        foreach (var pair in algorithms) {
            TimeAction(pair.Key + " calculation", 500, () =>
            {
                pair.Value.ComputeHash(source);
            });
        }

        Console.ReadKey();
    }

The fact that you mention md5 fits in the GUID (16 byte) format suggests a fundamental misunderstanding. A hash is not guaranteed to be unique, but is rare (and hard to fake if used in a cryptographic sense) and derived from the thing of which it is a hash while a GUID is, well, unique but unrelated to the content of the thing it identifies. They are used for very different purposes. — Barry Wark
Correct its unrelated, its just a handy implementation specific fact. I understand that you can not fit infinity into a 16 bytes. You can get collisions with ANY hashing algorithm — Sam Saffron
Also a Guid is just in-practice unique, in theory if you kept on generating Guids eventually you would get duplicates. — Sam Saffron
You really should not stuff a hash into a GUID, even if it fits. Simplest example: two copies of the same file should have different GUIDs, but same hash. First 8 letters of a person's name also fit quite nicely into 16 bytes. — dbkk
@user2332868 Breaking of SHA-1 has no effect on probability of accidental collisions. When a malicious intent is a threat for your usage, I think blindly picking any hash function is wrong, and you need to spend time doing risk/cost analysis for your specific case. — Andrey Tarantsov

Eric Burnett Eric Burnett · Accepted Answer · 2009-04-29T06:30:15

In cryptography, hash functions provide three separate functions.

Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
Second preimage resistance: Given a message, find another message that hashes the same.

These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.

It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.

SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability. For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.

SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.

RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.

In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.

Which cryptographic hash function should I choose?

Under what conditions should I use which hashing algorithm?

9 Answers

All hash functions are "broken"

MD5/SHA1/Sha2xx have no chance collisions

But ... MD5 is broken

So what hash function should I use in .NET?

But wait there is more

Update:

Original Answer: