144
votes

The .NET framework ships with 6 different hashing algorithms:

  • MD5: 16 bytes (Time to hash 500MB: 1462 ms)
  • SHA-1: 20 bytes (1644 ms)
  • SHA256: 32 bytes (5618 ms)
  • SHA384: 48 bytes (3839 ms)
  • SHA512: 64 bytes (3820 ms)
  • RIPEMD: 20 bytes (7066 ms)

Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.

MD5 has the advantage that it fits in the built-in Guid type; and it is the basis of the type 3 UUID. SHA-1 hash is the basis of type 5 UUID. Which makes them really easy to use for identification.

MD5 however is vulnerable to collision attacks, SHA-1 is also vulnerable but to a lesser degree.

Under what conditions should I use which hashing algorithm?

Particular questions I'm really curious to see answered are:

  • Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)

  • How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.

  • What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?

This is the benchmark I used:

    static void TimeAction(string description, int iterations, Action func) {
        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

    static byte[] GetRandomBytes(int count) {
        var bytes = new byte[count];
        (new Random()).NextBytes(bytes);
        return bytes;
    }


    static void Main(string[] args) {

        var md5 = new MD5CryptoServiceProvider();
        var sha1 = new SHA1CryptoServiceProvider();
        var sha256 = new SHA256CryptoServiceProvider();
        var sha384 = new SHA384CryptoServiceProvider();
        var sha512 = new SHA512CryptoServiceProvider();
        var ripemd160 = new RIPEMD160Managed();

        var source = GetRandomBytes(1000 * 1024);

        var algorithms = new Dictionary<string,HashAlgorithm>();
        algorithms["md5"] = md5;
        algorithms["sha1"] = sha1;
        algorithms["sha256"] = sha256;
        algorithms["sha384"] = sha384;
        algorithms["sha512"] = sha512;
        algorithms["ripemd160"] = ripemd160;

        foreach (var pair in algorithms) {
            Console.WriteLine("Hash Length for {0} is {1}", 
                pair.Key, 
                pair.Value.ComputeHash(source).Length);
        }

        foreach (var pair in algorithms) {
            TimeAction(pair.Key + " calculation", 500, () =>
            {
                pair.Value.ComputeHash(source);
            });
        }

        Console.ReadKey();
    }
9
The fact that you mention md5 fits in the GUID (16 byte) format suggests a fundamental misunderstanding. A hash is not guaranteed to be unique, but is rare (and hard to fake if used in a cryptographic sense) and derived from the thing of which it is a hash while a GUID is, well, unique but unrelated to the content of the thing it identifies. They are used for very different purposes.Barry Wark
Correct its unrelated, its just a handy implementation specific fact. I understand that you can not fit infinity into a 16 bytes. You can get collisions with ANY hashing algorithmSam Saffron
Also a Guid is just in-practice unique, in theory if you kept on generating Guids eventually you would get duplicates.Sam Saffron
You really should not stuff a hash into a GUID, even if it fits. Simplest example: two copies of the same file should have different GUIDs, but same hash. First 8 letters of a person's name also fit quite nicely into 16 bytes.dbkk
@user2332868 Breaking of SHA-1 has no effect on probability of accidental collisions. When a malicious intent is a threat for your usage, I think blindly picking any hash function is wrong, and you need to spend time doing risk/cost analysis for your specific case.Andrey Tarantsov

9 Answers

144
votes

In cryptography, hash functions provide three separate functions.

  1. Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
  2. Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
  3. Second preimage resistance: Given a message, find another message that hashes the same.

These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.

It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.

SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability. For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.

SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.

RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.

In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.

115
votes

All hash functions are "broken"

The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.

MD5/SHA1/Sha2xx have no chance collisions

All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.

But ... MD5 is broken

Sometimes the fact that its broken does not matter.

As it stands there are no known pre-image or second pre-image attacks on MD5.

So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)

Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.

So what hash function should I use in .NET?

  • Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks.

Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.

Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.

  • Use SHA2XX based function if you want a cryptographically secure hash function.

No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .

  • Don't use SHA1 or RIPEMD unless its for an interoperability scenario.

RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are vulnerable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.

SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.

For up to date information about the various hash functions have a look at the hash function zoo.

But wait there is more

Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.

The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.

To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.

36
votes

Update:

Times have changed, we have a SHA3 winner. I would recommend using keccak (aka SHA3) winner of the SHA3 contest.

Original Answer:

In order of weakest to strongest I would say:

  1. RIPEMD BROKEN, Should never be used as can be seen in this pdf
  2. MD-5 BROKEN, Should never be used, can be broken in 2 minutes with a laptop
  3. SHA-1 BROKEN, Should never be used, is broken in principal, attacks are getting better by the week
  4. SHA-2 WEAK, Will probably be broken in the next few years. A few weaknesses have been found. Note that generally the higher key size, the harder the hash function is to break. While key size = strength is not always true, it is mostly true. So SHA-256 is probably weaker than SHA-512.
  5. Skein NO KNOWN WEAKNESSES, is a candidate for SHA-3. It is fairly new and thus untested. It has been implemented in a bunch of languages.
  6. MD6 NO KNOWN WEAKNESSES, is another a candidate for SHA-3. Probably stronger than Skien, but slower on single core machines. Like Skien it is untested. Some security minded developers are using it, in mission critical roles.

Personally I'd use MD6, because one can never been too paranoid. If speed is a real concern I'd look at Skein, or SHA-256.

3
votes

In MD5's defense, there is no known way to produce a file with an arbitrary MD5 hash. The original author must plan in advance to have a working collision. Thus if the receiver trusts the sender, MD5 is fine. MD5 is broken if the signer is malicious, but it is not known to be vulnerable to man-in-the-middle attacks.

3
votes

It would be a good ideea to take a look at the BLAKE2 algorythm.

As it is described, it is faster than MD5 and at least as secure as SHA-3. It is also implemented by several software applications, including WinRar.

2
votes

Which one you use really depends on what you are using it for. If you just want to make sure that files don't get corrupted in transit and aren't that concerned about security, go for fast and small. If you need digital signatures for multi-billion dollar federal bailout agreements and need to make sure they aren't forged, go for hard to spoof and slow.

2
votes

I would like to chime in (before md5 gets torn apart) that I do still use md5 extensively despite its overwhelming brokenness for a lot of crypto.

As long as you don't care to protect against collisions (you are still safe to use md5 in an hmac as well) and you do want the speed (sometimes you want a slower hash) then you can still use md5 confidently.

0
votes

I am not an expert at this sort of thing, but I keep up with the security community and a lot of people there consider the md5 hash broken. I would say that which one to use depends on how sensitive the data is and the specific application. You might be able to get away with a slightly less secure hash as long as the key is good and strong.

0
votes

Here are my suggestions for you:

  1. You should probably forget MD5 if you anticipate attacks. There are many rainbow tables for them online, and corporations like the RIAA have been known to be able to produce sequences with equivalent hashes.
  2. Use a salt if you can. Including the message length in the message can make it very difficult to make a useful hash collision.
  3. As a general rule of thumb, more bits means less collisions (by pigeonhole principle) and slower, and maybe more secure (unless you are a math genius who can find vulnerabilities).

See here for a paper detailing an algorithm to create md5 collisions in 31 seconds with a desktop Intel P4 computer.

http://eprint.iacr.org/2006/105