232
votes

I'm having some trouble understanding the purpose of a salt to a password. It's my understanding that the primary use is to hamper a rainbow table attack. However, the methods I've seen to implement this don't seem to really make the problem harder.

I've seen many tutorials suggesting that the salt be used as the following:

$hash =  md5($salt.$password)

The reasoning being that the hash now maps not to the original password, but a combination of the password and the salt. But say $salt=foo and $password=bar and $hash=3858f62230ac3c915f300c664312c63f. Now somebody with a rainbow table could reverse the hash and come up with the input "foobar". They could then try all combinations of passwords (f, fo, foo, ... oobar, obar, bar, ar, ar). It might take a few more milliseconds to get the password, but not much else.

The other use I've seen is on my linux system. In the /etc/shadow the hashed passwords are actually stored with the salt. For example, a salt of "foo" and password of "bar" would hash to this: $1$foo$te5SBM.7C25fFDu6bIRbX1. If a hacker somehow were able to get his hands on this file, I don't see what purpose the salt serves, since the reverse hash of te5SBM.7C25fFDu6bIRbX is known to contain "foo".

Thanks for any light anybody can shed on this.

EDIT: Thanks for the help. To summarize what I understand, the salt makes the hashed password more complex, thus making it much less likely to exist in a precomputed rainbow table. What I misunderstood before was that I was assuming a rainbow table existed for ALL hashes.

10
Also, updated here - use of md5 hashing is no longer best practice. stackoverflow.com/questions/12724935/salt-and-passwordsStuartLC
Thanks for the Edit. I had the same doubt which is now clarified. So the point of 'Salt' really is to make it highly unlikely for a Rainbow table to contain the hash of the adulterated (salted) password, at the first place. :DVaibhav

10 Answers

252
votes

A public salt will not make dictionary attacks harder when cracking a single password. As you've pointed out, the attacker has access to both the hashed password and the salt, so when running the dictionary attack, she can simply use the known salt when attempting to crack the password.

A public salt does two things: makes it more time-consuming to crack a large list of passwords, and makes it infeasible to use a rainbow table.

To understand the first one, imagine a single password file that contains hundreds of usernames and passwords. Without a salt, I could compute "md5(attempt[0])", and then scan through the file to see if that hash shows up anywhere. If salts are present, then I have to compute "md5(salt[a] . attempt[0])", compare against entry A, then "md5(salt[b] . attempt[0])", compare against entry B, etc. Now I have n times as much work to do, where n is the number of usernames and passwords contained in the file.

To understand the second one, you have to understand what a rainbow table is. A rainbow table is a large list of pre-computed hashes for commonly-used passwords. Imagine again the password file without salts. All I have to do is go through each line of the file, pull out the hashed password, and look it up in the rainbow table. I never have to compute a single hash. If the look-up is considerably faster than the hash function (which it probably is), this will considerably speed up cracking the file.

But if the password file is salted, then the rainbow table would have to contain "salt . password" pre-hashed. If the salt is sufficiently random, this is very unlikely. I'll probably have things like "hello" and "foobar" and "qwerty" in my list of commonly-used, pre-hashed passwords (the rainbow table), but I'm not going to have things like "jX95psDZhello" or "LPgB0sdgxfoobar" or "dZVUABJtqwerty" pre-computed. That would make the rainbow table prohibitively large.

So, the salt reduces the attacker back to one-computation-per-row-per-attempt, which, when coupled with a sufficiently long, sufficiently random password, is (generally speaking) uncrackable.

120
votes

The other answers don't seem to address your misunderstandings of the topic, so here goes:

Two different uses of salt

I've seen many tutorials suggesting that the salt be used as the following:

$hash = md5($salt.$password)

[...]

The other use I've seen is on my linux system. In the /etc/shadow the hashed passwords are actually stored with the salt.

You always have to store the salt with the password, because in order to validate what the user entered against your password database, you have to combine the input with the salt, hash it and compare it to the stored hash.

Security of the hash

Now somebody with a rainbow table could reverse the hash and come up with the input "foobar".

[...]

since the reverse hash of te5SBM.7C25fFDu6bIRbX is known to contain "foo".

It is not possible to reverse the hash as such (in theory, at least). The hash of "foo" and the hash of "saltfoo" have nothing in common. Changing even one bit in the input of a cryptographic hash function should completely change the output.

This means you cannot build a rainbow table with the common passwords and then later "update" it with some salt. You have to take the salt into account from the beginning.

This is the whole reason for why you need a rainbow table in the first place. Because you cannot get to the password from the hash, you precompute all the hashes of the most likely used passwords and then compare your hashes with their hashes.

Quality of the salt

But say $salt=foo

"foo" would be an extremely poor choice of salt. Normally you would use a random value, encoded in ASCII.

Also, each password has it's own salt, different (hopefully) from all other salts on the system. This means, that the attacker has to attack each password individually instead of having the hope that one of the hashes matches one of the values in her database.

The attack

If a hacker somehow were able to get his hands on this file, I don't see what purpose the salt serves,

A rainbow table attack always needs /etc/passwd (or whatever password database is used), or else how would you compare the hashes in the rainbow table to the hashes of the actual passwords?

As for the purpose: let's say the attacker wants to build a rainbow table for 100,000 commonly used english words and typical passwords (think "secret"). Without salt she would have to precompute 100,000 hashes. Even with the traditional UNIX salt of 2 characters (each is one of 64 choices: [a–zA–Z0–9./]) she would have to compute and store 4,096,000,000 hashes... quite an improvement.

87
votes

The idea with the salt is to make it much harder to guess with brute-force than a normal character-based password. Rainbow tables are often built with a special character set in mind, and don't always include all possible combinations (though they can).

So a good salt value would be a random 128-bit or longer integer. This is what makes rainbow-table attacks fail. By using a different salt value for each stored password, you also ensure that a rainbow table built for one particular salt value (as could be the case if you're a popular system with a single salt value) does not give you access to all passwords at once.

36
votes

Yet another great question, with many very thoughtful answers -- +1 to SO!

One small point that I haven't seen mentioned explicitly is that, by adding a random salt to each password, you're virtually guaranteeing that two users who happened to choose the same password will produce different hashes.

Why is this important?

Imagine the password database at a large software company in the northwest US. Suppose it contains 30,000 entries, of which 500 have the password bluescreen. Suppose further that a hacker manages to obtain this password, say by reading it in an email from the user to the IT department. If the passwords are unsalted, the hacker can find the hashed value in the database, then simply pattern-match it to gain access to the other 499 accounts.

Salting the passwords ensures that each of the 500 accounts has a unique (salt+password), generating a different hash for each of them, and thereby reducing the breach to a single account. And let's hope, against all probability, that any user naive enough to write a plaintext password in an email message doesn't have access to the undocumented API for the next OS.

15
votes

I was searching for a good method to apply salts and found this excelent article with sample code:

http://crackstation.net/hashing-security.htm

The author recomends using random salts per user, so that gaining access to a salt won't render the entire list of hashes as easy to crack.

To Store a Password:

  • Generate a long random salt using a CSPRNG.
  • Prepend the salt to the password and hash it with a standard cryptographic hash function such as SHA256.
  • Save both the salt and the hash in the user's database record.

To Validate a Password :

  • Retrieve the user's salt and hash from the database.
  • Prepend the salt to the given password and hash it using the same hash function.
  • Compare the hash of the given password with the hash from the database. If they match, the password is correct. Otherwise, the password is incorrect.
12
votes

The reason a salt can make a rainbow-table attack fail is that for n-bits of salt, the rainbow table has to be 2^n times larger than the table size without the salt.

Your example of using 'foo' as a salt could make the rainbow-table 16 million times larger.

Given Carl's example of a 128-bit salt, this makes the table 2^128 times larger - now that's big - or put another way, how long before someone has portable storage that big?

10
votes

Most methods of breaking hash based encryption rely on brute force attacks. A rainbow attack is essentially a more efficient dictionary attack, it's designed to use the low cost of digital storage to enable creation of a map of a substantial subset of possible passwords to hashes, and facilitate the reverse mapping. This sort of attack works because many passwords tend to be either fairly short or use one of a few patterns of word based formats.

Such attacks are ineffective in the case where passwords contain many more characters and do not conform to common word based formats. A user with a strong password to start with won't be vulnerable to this style of attack. Unfortunately, many people do not pick good passwords. But there's a compromise, you can improve a user's password by adding random junk to it. So now, instead of "hunter2" their password could become effectively "hunter2908!fld2R75{R7/;508PEzoz^U430", which is a much stronger password. However, because you now have to store this additional password component this reduces the effectiveness of the stronger composite password. As it turns out, there's still a net benefit to such a scheme since now each password, even the weak ones, are no longer vulnerable to the same pre-computed hash / rainbow table. Instead, each password hash entry is vulnerable only to a unique hash table.

Say you have a site which has weak password strength requirements. If you use no password salt at all your hashes are vulnerable to pre-computed hash tables, someone with access to your hashes would thus have access to the passwords for a large percentage of your users (however many used vulnerable passwords, which would be a substantial percentage). If you use a constant password salt then pre-computed hash tables are no longer valuable, so someone would have to spend the time to compute a custom hash table for that salt, they could do so incrementally though, computing tables which cover ever greater permutations of the problem space. The most vulnerable passwords (e.g. simple word based passwords, very short alphanumeric passwords) would be cracked in hours or days, less vulnerable passwords would be cracked after a few weeks or months. As time goes on an attacker would gain access to passwords for an ever growing percentage of your users. If you use a unique salt for every password then it would take days or months to gain access to each one of those vulnerable passwords.

As you can see, when you step up from no salt to a constant salt to a unique salt you impose a several orders of magnitude increase in effort to crack vulnerable passwords at each step. Without a salt the weakest of your users' passwords are trivially accessible, with a constant salt those weak passwords are accessible to a determined attacker, with a unique salt the cost of accessing passwords is raised so high that only the most determined attacker could gain access to a tiny subset of vulnerable passwords, and then only at great expense.

Which is precisely the situation to be in. You can never fully protect users from poor password choice, but you can raise the cost of compromising your users' passwords to a level that makes compromising even one user's password prohibitively expensive.

3
votes

One purpose of salting is to defeat precomputed hash tables. If someone has a list of millions of pre-computed hashes, they aren't going to be able to look up $1$foo$te5SBM.7C25fFDu6bIRbX1 in their table even though they know the hash and the salt. They'll still have to brute force it.

Another purpose, as Carl S mentions is to make brute forcing a list of hashes more expensive. (give them all different salts)

Both of these objectives are still accomplished even if the salts are public.

1
votes

As far as I know, the salt is intended to make dictionary attacks harder.

It's a known fact that many people will use common words for passwords instead of seemingly random strings.

So, a hacker could use this to his advantage instead of using just brute force. He will not look for passwords like aaa, aab, aac... but instead use words and common passwords (like lord of the rings names! ;) )

So if my password is Legolas a hacker could try that and guess it with a "few" tries. However if we salt the password and it becomes fooLegolas the hash will be different, so the dictionary attack will be unsuccessful.

Hope that helps!

-2
votes

I assume that you are using PHP --- md5() function, and $ preceded variables --- then, you can try looking this article Shadow Password HOWTO Specially the 11th paragraph.

Also, you are afraid of using message digest algorithms, you can try real cipher algorithms, such as the ones provided by the mcrypt module, or more stronger message digest algorithms, such as the ones that provide the mhash module (sha1, sha256, and others).

I think that stronger message digest algorithm are a must. It's known that MD5 and SHA1 are having collision problems.