0
votes

I would like to create an algorithm that could detect credit card numbers (CCNs) from various types of files.

The simple scenario how to find CCNs is to use regular expressions as defined:

  1. Visa: ^4[0-9]{12}(?:[0-9]{3})?$ All Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13.
  2. MasterCard: ^5[1-5][0-9]{14}$ All MasterCard numbers start with the numbers 51 through 55. All have 16 digits.
  3. American Express: ^3[47][0-9]{13}$ American Express card numbers start with 34 or 37 and have 15 digits.
  4. Diners Club: ^3(?:0[0-5]|[68][0-9])[0-9]{11}$ Diners Club card numbers begin with 300 through 305, 36 or 38. All have 14 digits. There are Diners Club cards that begin with 5 and have 16 digits. These are a joint venture between Diners Club and MasterCard, and should be processed like a MasterCard.
  5. Discover: ^6(?:011|5[0-9]{2})[0-9]{12}$ Discover card numbers begin with 6011 or 65. All have 16 digits.
  6. JCB: ^(?:2131|1800|35\d{3})\d{11}$ JCB cards beginning with 2131 or 1800 have 15 digits. JCB cards beginning with 35 have 16 digits.

Then we can check found number with Luhn Mod-10 algorithm and if it fulfills the conditions we can say that we have found the CCN.

But this simple method have a very high number of false positives/negatives from my experience.

What algorithms or heuristics could be used to reduce the false positives/negatives matches? The advanced software like PCI Data Finder or Card Recon are providing more reliable results and that results definitely isn't achieved by simple regular expressions finding and Luhn check.

1
What exactly goes wrong if you just grab all 15/16 digit numbers(omitting hyphens) and check them? It seems that it might be fast enough(compared to the "simple" regex method) to make up for false positives.Geobits

1 Answers

0
votes

You could use a source like BINDB.com to purchase the BIN (Bank Identification Numbers) and thereby reduce false positives by only considering cards where the first six (or in some cases eight) digits match an existing card-issuing bank.

If you were only looking for US issued cards, you could substantially reduce this number yet with the same approach.