4
votes

For a payment system that allows bank/wire transfers, I need to reliably associate payments with the corresponding user account that they are intended for. For this, the user should include a reference number on the transfer that is associated with his account.

I would like to generate this number with built-in redundancy (extra symbols), so that I can detect and correct up to N of the following (probably common) errors:

  • An incorrect symbol in the sequence (typo)
  • A swap of two symbols (which I guess is the same as two incorrect ones)
  • A missing symbol in the sequence
  • An extra symbol in the sequence

I searched around a bit and it seems like Reed Solomon or BCH are commonly used codes for this. The only thing I couldn't find is whether they support the last case, i.e. extra symbols.

Also, I would like for the code to have a failure mode where it says: "this is so screwed up, I can't fix it" rather than giving me a random "corrected" result. I guess I could do this simply by generating sparse reference numbers and hoping that it will be unlikely that it will hit a valid one by accident, but I'd rather have something like: "I can correct up to 5 errors, but if it's more than 3, I give up."

Any thoughts? Thank you!

1

1 Answers

1
votes

I haven't spent that much time looking into this further, yet, but I think I have come up with a preliminary way to solve this problem, which I will pursue for now:

I will create the account reference number out of an alphabet of 32 characters. This alphabet I will split into 2 sets of 16 characters, optimizing the sets to minimize the chance that a random typo will yield a letter from the other set. For example, simply split your keyboard in half, using the letters in the box with the corners [1], [4], [v], [z] for the one set and the other letters as the other set.

I will then use a [14, 8, 7]16 Reed-Solomon code to encode the 32-bit account number, which I first split into eight 4-bit characters.

The resulting message, I will turn into the reference number by choosing the 1st, 3rd, 5th, ... characters from the 1st alphabet-half and the others from the 2nd alphabet-half. That way, I can "resynchronize" the reference number if I detect any swapped, extra, or missing characters.

After resynchronization, the RS code should then allow me to correct up to 3 other typos and if anyone makes more mistakes than that, they deserve to run into issues with their payment... :)

I would love to hear any comments anyone might have on this approach.