Error safe/correcting resource identifier

Question

The receiver is my website, the sender is the same but the medium is noisy, a user. He will read an alphanumeric code of length 6 and later input the same code to identify a resource. A good use for a error correcting code, I thought, and rather than do the research I thought I'd just put the question out there. Or I might be going about it the wrong way, since the situation is rather like sending a perfect dictionary along with every transmission.

The requirements on the code are simply:

6 alphanumeric digits, to start with until I run out, anyway.
If the user gets it wrong I should still be able to identify the right resource.
No resource is preferable to the wrong one.
Easy to code or have free libraries for .net

Any suggestions?

Edit:

It seems to me that half the requirements can be fulfilled by choosing the codes wisely, i.e. with sufficient distance between them. This strategy looks even better when I realize that the longer it has been since the codes were generated, the more unlikely they are to be used.

The codes won't be directly typed into my website, so I can't give immediate feedback. Actually, we can assume that unless I can verify the code I can't even identify the user, so I can't really give feedback at all.

How hard is the constraint of 6 alphanumerics? And how many resources do you need to distinguish (worst case / high estimate)? — Peter Taylor
The error-correction code you choose is heavily dependent on how many errors you're prepared to tolerate. — Oliver Charlesworth
6 digits are easy to remember, so I'll use it as long as possible. Perhaps as many as 10 million resources before we move on to 7 digits. I'm counting 1.3 billion codes with 6 digits. Possible? — Martin
@Oli The consequences for a wrong hit are not disastrous, just costly and annoying. A no hit is better. So which ones should I research in your opinion? — Martin

Hans Passant Hans Passant · Accepted Answer · 2011-01-30T23:20:26

Well, no, your first order approach should be to check if the user entered the right number. A human mistyping a number or string is 99% of the problem. That's well established in practice, the check digit is used in many common codes, UPC and ISBN being the ones that you'd see everyday. You can flag it, they can re-type it again.

Error correcting codes are common too but have a very different application. Traditionally it is used in digital signaling media, aiming to detect and correct bit errors. Reed-Solomon is the hundred pound gorilla there, really put on the map by the music CD.

That doesn't really work well in practice with a human, they'll introduce at least 6 bits of bad data by mis-typing one key. That's very hard to correct, you'd have to add lots and lots of redundancy bits to the adjacent letters, making it more likely for the code to be mis-typed. The best way for human-readable codes is for the code to make sense to a human. Something that triggers the 'that looks wrong' response. Like a given name, as long as you're not Frank Zappa's kid. But otherwise the foundation of codes like letter-letter + numbers. Etcetera.

Error safe/correcting resource identifier

1 Answers