I'm looking for the ultimate postal code and zip code regex. I'm looking for something that will cover most (hopefully all) of the world.
20 Answers
There is none.
Postal/zip codes around the world don't follow a common pattern. In some countries they are made up by numbers, in others they can be combinations of numbers an letters, some can contain spaces, others dots, the number of characters can vary from two to at least six...
What you could do (theoretically) is create a seperate regex for every country in the world, not recommendable IMO. But you would still be missing on the validation part: Zip code 12345
may exist, but 12346
not, maybe 12344
doesn't exist either. How do you check for that with a regex?
You can't.
The unicode CLDR contains the postal code regex for each country. (158 regex's in total!)
- Download
core.zip
from http://unicode.org/Public/cldr/26.0.1/ - unzip core.zip
- Take a look at
common/supplemental/postalCodeData.xml
from the unzipped content (direct content: common/supplemental/postalCodeData.xml)
Google also has a web service with per-country address formatting information, including postal codes, here - http://i18napis.appspot.com/address (I found that link via http://unicode.org/review/pri180/ )
Edit
Here a copy of postalCodeData.xml regex :
"GB", "GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}"
"JE", "JE\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}"
"GG", "GY\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}"
"IM", "IM\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}"
"US", "\d{5}([ \-]\d{4})?"
"CA", "[ABCEGHJKLMNPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ ]?\d[ABCEGHJ-NPRSTV-Z]\d"
"DE", "\d{5}"
"JP", "\d{3}-\d{4}"
"FR", "\d{2}[ ]?\d{3}"
"AU", "\d{4}"
"IT", "\d{5}"
"CH", "\d{4}"
"AT", "\d{4}"
"ES", "\d{5}"
"NL", "\d{4}[ ]?[A-Z]{2}"
"BE", "\d{4}"
"DK", "\d{4}"
"SE", "\d{3}[ ]?\d{2}"
"NO", "\d{4}"
"BR", "\d{5}[\-]?\d{3}"
"PT", "\d{4}([\-]\d{3})?"
"FI", "\d{5}"
"AX", "22\d{3}"
"KR", "\d{3}[\-]\d{3}"
"CN", "\d{6}"
"TW", "\d{3}(\d{2})?"
"SG", "\d{6}"
"DZ", "\d{5}"
"AD", "AD\d{3}"
"AR", "([A-HJ-NP-Z])?\d{4}([A-Z]{3})?"
"AM", "(37)?\d{4}"
"AZ", "\d{4}"
"BH", "((1[0-2]|[2-9])\d{2})?"
"BD", "\d{4}"
"BB", "(BB\d{5})?"
"BY", "\d{6}"
"BM", "[A-Z]{2}[ ]?[A-Z0-9]{2}"
"BA", "\d{5}"
"IO", "BBND 1ZZ"
"BN", "[A-Z]{2}[ ]?\d{4}"
"BG", "\d{4}"
"KH", "\d{5}"
"CV", "\d{4}"
"CL", "\d{7}"
"CR", "\d{4,5}|\d{3}-\d{4}"
"HR", "\d{5}"
"CY", "\d{4}"
"CZ", "\d{3}[ ]?\d{2}"
"DO", "\d{5}"
"EC", "([A-Z]\d{4}[A-Z]|(?:[A-Z]{2})?\d{6})?"
"EG", "\d{5}"
"EE", "\d{5}"
"FO", "\d{3}"
"GE", "\d{4}"
"GR", "\d{3}[ ]?\d{2}"
"GL", "39\d{2}"
"GT", "\d{5}"
"HT", "\d{4}"
"HN", "(?:\d{5})?"
"HU", "\d{4}"
"IS", "\d{3}"
"IN", "\d{6}"
"ID", "\d{5}"
"IL", "\d{5}"
"JO", "\d{5}"
"KZ", "\d{6}"
"KE", "\d{5}"
"KW", "\d{5}"
"LA", "\d{5}"
"LV", "\d{4}"
"LB", "(\d{4}([ ]?\d{4})?)?"
"LI", "(948[5-9])|(949[0-7])"
"LT", "\d{5}"
"LU", "\d{4}"
"MK", "\d{4}"
"MY", "\d{5}"
"MV", "\d{5}"
"MT", "[A-Z]{3}[ ]?\d{2,4}"
"MU", "(\d{3}[A-Z]{2}\d{3})?"
"MX", "\d{5}"
"MD", "\d{4}"
"MC", "980\d{2}"
"MA", "\d{5}"
"NP", "\d{5}"
"NZ", "\d{4}"
"NI", "((\d{4}-)?\d{3}-\d{3}(-\d{1})?)?"
"NG", "(\d{6})?"
"OM", "(PC )?\d{3}"
"PK", "\d{5}"
"PY", "\d{4}"
"PH", "\d{4}"
"PL", "\d{2}-\d{3}"
"PR", "00[679]\d{2}([ \-]\d{4})?"
"RO", "\d{6}"
"RU", "\d{6}"
"SM", "4789\d"
"SA", "\d{5}"
"SN", "\d{5}"
"SK", "\d{3}[ ]?\d{2}"
"SI", "\d{4}"
"ZA", "\d{4}"
"LK", "\d{5}"
"TJ", "\d{6}"
"TH", "\d{5}"
"TN", "\d{4}"
"TR", "\d{5}"
"TM", "\d{6}"
"UA", "\d{5}"
"UY", "\d{5}"
"UZ", "\d{6}"
"VA", "00120"
"VE", "\d{4}"
"ZM", "\d{5}"
"AS", "96799"
"CC", "6799"
"CK", "\d{4}"
"RS", "\d{6}"
"ME", "8\d{4}"
"CS", "\d{5}"
"YU", "\d{5}"
"CX", "6798"
"ET", "\d{4}"
"FK", "FIQQ 1ZZ"
"NF", "2899"
"FM", "(9694[1-4])([ \-]\d{4})?"
"GF", "9[78]3\d{2}"
"GN", "\d{3}"
"GP", "9[78][01]\d{2}"
"GS", "SIQQ 1ZZ"
"GU", "969[123]\d([ \-]\d{4})?"
"GW", "\d{4}"
"HM", "\d{4}"
"IQ", "\d{5}"
"KG", "\d{6}"
"LR", "\d{4}"
"LS", "\d{3}"
"MG", "\d{3}"
"MH", "969[67]\d([ \-]\d{4})?"
"MN", "\d{6}"
"MP", "9695[012]([ \-]\d{4})?"
"MQ", "9[78]2\d{2}"
"NC", "988\d{2}"
"NE", "\d{4}"
"VI", "008(([0-4]\d)|(5[01]))([ \-]\d{4})?"
"PF", "987\d{2}"
"PG", "\d{3}"
"PM", "9[78]5\d{2}"
"PN", "PCRN 1ZZ"
"PW", "96940"
"RE", "9[78]4\d{2}"
"SH", "(ASCN|STHL) 1ZZ"
"SJ", "\d{4}"
"SO", "\d{5}"
"SZ", "[HLMS]\d{3}"
"TC", "TKCA 1ZZ"
"WF", "986\d{2}"
"XK", "\d{5}"
"YT", "976\d{2}"
use these regx
$ZIPREG=array(
"US"=>"^\d{5}([\-]?\d{4})?$",
"UK"=>"^(GIR|[A-Z]\d[A-Z\d]??|[A-Z]{2}\d[A-Z\d]??)[ ]??(\d[A-Z]{2})$",
"DE"=>"\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b",
"CA"=>"^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ])\ {0,1}(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$",
"FR"=>"^(F-)?((2[A|B])|[0-9]{2})[0-9]{3}$",
"IT"=>"^(V-|I-)?[0-9]{5}$",
"AU"=>"^(0[289][0-9]{2})|([1345689][0-9]{3})|(2[0-8][0-9]{2})|(290[0-9])|(291[0-4])|(7[0-4][0-9]{2})|(7[8-9][0-9]{2})$",
"NL"=>"^[1-9][0-9]{3}\s?([a-zA-Z]{2})?$",
"ES"=>"^([1-9]{2}|[0-9][1-9]|[1-9][0-9])[0-9]{3}$",
"DK"=>"^([D|d][K|k]( |-))?[1-9]{1}[0-9]{3}$",
"SE"=>"^(s-|S-){0,1}[0-9]{3}\s?[0-9]{2}$",
"BE"=>"^[1-9]{1}[0-9]{3}$",
"IN"=>"^\d{6}$"
);
Every postal code system uses only A-Z and/or 0-9 and sometimes space/dash
Not every country uses postal codes (ex. Ireland outside of Dublin), but we'll ignore that here.
The shortest postal code format is Sierra Leone with
NN
The longest is American Samoa with
NNNNN-NNNNNN
You should allow one space or dash.
Should not begin or end with space or dash
This should cover the above:
(?i)^[a-z0-9][a-z0-9\- ]{0,10}[a-z0-9]$
Trying to cover the whole world with one regular expression is not completely possible, and certainly not feasible or recommended.
Not to toot my own horn, but I've written some pretty thorough regular expressions which you may find helpful.
-
Basic validation: ^[ABCEGHJ-NPRSTVXY]{1}[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$ Extended validation: ^(A(0[ABCEGHJ-NPR]|1[ABCEGHK-NSV-Y]|2[ABHNV]|5[A]|8[A])|B(0[CEHJ-NPRSTVW]|1[ABCEGHJ-NPRSTV-Y]|2[ABCEGHJNRSTV-Z]|3[ABEGHJ-NPRSTVZ]|4[ABCEGHNPRV]|5[A]|6[L]|9[A])|C(0[AB]|1[ABCEN])|E(1[ABCEGHJNVWX]|2[AEGHJ-NPRSV]|3[ABCELNVYZ]|4[ABCEGHJ-NPRSTV-Z]|5[ABCEGHJ-NPRSTV]|6[ABCEGHJKL]|7[ABCEGHJ-NP]|8[ABCEGJ-NPRST]|9[ABCEGH])|G(0[ACEGHJ-NPRSTV-Z]|1[ABCEGHJ-NPRSTV-Y]|2[ABCEGJ-N]|3[ABCEGHJ-NZ]|4[ARSTVWXZ]|5[ABCHJLMNRTVXYZ]|6[ABCEGHJKLPRSTVWXZ]|7[ABGHJKNPSTXYZ]|8[ABCEGHJ-NPTVWYZ]|9[ABCHNPRTX])|H(0[HM]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NPRSTV-Z]|4[ABCEGHJ-NPRSTV-Z]|5[AB]|7[ABCEGHJ-NPRSTV-Y]|8[NPRSTYZ]|9[ABCEGHJKPRSWX])|J(0[ABCEGHJ-NPRSTV-Z]|1[ACEGHJ-NRSTXZ]|2[ABCEGHJ-NRSTWXY]|3[ABEGHLMNPRTVXYZ]|4[BGHJ-NPRSTV-Z]|5[ABCJ-MRTV-Z]|6[AEJKNRSTVWYXZ]|7[ABCEGHJ-NPRTV-Z]|8[ABCEGHLMNPRTVXYZ]|9[ABEHJLNTVXYZ])|K(0[ABCEGHJ-M]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-MPRSTVW]|4[ABCKMPR]|6[AHJKTV]|7[ACGHK-NPRSV]|8[ABHNPRV]|9[AHJKLV])|L(0[[ABCEGHJ-NPRS]]|1[ABCEGHJ-NPRSTV-Z]|2[AEGHJMNPRSTVW]|3[BCKMPRSTVXYZ]|4[ABCEGHJ-NPRSTV-Z]|5[ABCEGHJ-NPRSTVW]|6[ABCEGHJ-MPRSTV-Z]|7[ABCEGJ-NPRST]|8[EGHJ-NPRSTVW]|9[ABCGHK-NPRSTVWYZ])|M(1[BCEGHJ-NPRSTVWX]|2[HJ-NPR]|3[ABCHJ-N]|4[ABCEGHJ-NPRSTV-Y]|5[ABCEGHJ-NPRSTVWX]|6[ABCEGHJ-NPRS]|7[AY]|8[V-Z]|9[ABCLMNPRVW])|N(0[ABCEGHJ-NPR]|1[ACEGHKLMPRST]|2[ABCEGHJ-NPRTVZ]|3[ABCEHLPRSTVWY]|4[BGKLNSTVWXZ]|5[ACHLPRV-Z]|6[ABCEGHJ-NP]|7[AGLMSTVWX]|8[AHMNPRSTV-Y]|9[ABCEGHJKVY])|P(0[ABCEGHJ-NPRSTV-Y]|1[ABCHLP]|2[ABN]|3[ABCEGLNPY]|4[NPR]|5[AEN]|6[ABC]|7[ABCEGJKL]|8[NT]|9[AN])|R(0[ABCEGHJ-M]|1[ABN]|2[CEGHJ-NPRV-Y]|3[ABCEGHJ-NPRSTV-Y]|4[AHJKL]|5[AGH]|6[MW]|7[ABCN]|8[AN]|9[A])|S(0[ACEGHJ-NP]|2[V]|3[N]|4[AHLNPRSTV-Z]|6[HJKVWX]|7[HJ-NPRSTVW]|9[AHVX])|T(0[ABCEGHJ-MPV]|1[ABCGHJ-MPRSV-Y]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NPRZ]|4[ABCEGHJLNPRSTVX]|5[ABCEGHJ-NPRSTV-Z]|6[ABCEGHJ-NPRSTVWX]|7[AENPSVXYZ]|8[ABCEGHLNRSVWX]|9[ACEGHJKMNSVWX])|V(0[ABCEGHJ-NPRSTVWX]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NRSTV-Y]|4[ABCEGK-NPRSTVWXZ]|5[ABCEGHJ-NPRSTV-Z]|6[ABCEGHJ-NPRSTV-Z]|7[ABCEGHJ-NPRSTV-Y]|8[ABCGJ-NPRSTV-Z]|9[ABCEGHJ-NPRSTV-Z])|X(0[ABCGX]|1[A])|Y(0[AB]|1[A]))[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$
-
^[0-9]{5}(-[0-9]{4})?$
-
^([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}|(GIR\ 0AA)|(SAN\ TA1)|(BFPO\ (C\/O\ )?[0-9]{1,4})|((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ))$
It is not possible to guarantee accuracy without actually mailing something to an address and having the person let you know when they receive it, but we can narrow things by down by eliminating cases that we know are bad.
This looks like a good reference although it's not in Regex.
Really, unless you're actually shipping something to your users, I don't think it's worth the effort. And if you are shipping it, there are address cleaning tools/services you can look into to make it way easier on yourself.
Please note that this is quite a hard problem, as stated by the accepted answer. I guess it didn't deter the folks at geonames.org though. They have a file a country info file, which doesn't fit whole into this answer - limit is at 30000 chars apparently. There are regexes for about 150 countries.
I extracted the bits relevant to this question here :
AD ^(?:AD)*(\d{3})$
AM ^(\d{6})$
AR ^([A-Z]\d{4}[A-Z]{3})$
AT ^(\d{4})$
AU ^(\d{4})$
AX ^(?:FI)*(\d{5})$
AZ ^(?:AZ)*(\d{4})$
BA ^(\d{5})$
BB ^(?:BB)*(\d{5})$
BD ^(\d{4})$
BE ^(\d{4})$
BG ^(\d{4})$
BH ^(\d{3}\d?)$
BM ^([A-Z]{2}\d{2})$
BN ^([A-Z]{2}\d{4})$
BR ^(\d{8})$
BY ^(\d{6})$
CA ^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ]) ?(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$
CH ^(\d{4})$
CL ^(\d{7})$
CN ^(\d{6})$
CR ^(\d{4})$
CU ^(?:CP)*(\d{5})$
CV ^(\d{4})$
CX ^(\d{4})$
CY ^(\d{4})$
CZ ^(\d{5})$
DE ^(\d{5})$
DK ^(\d{4})$
DO ^(\d{5})$
DZ ^(\d{5})$
EC ^([a-zA-Z]\d{4}[a-zA-Z])$
EE ^(\d{5})$
EG ^(\d{5})$
ES ^(\d{5})$
ET ^(\d{4})$
FI ^(?:FI)*(\d{5})$
FM ^(\d{5})$
FO ^(?:FO)*(\d{3})$
FR ^(\d{5})$
GB ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
GE ^(\d{4})$
GF ^((97|98)3\d{2})$
GG ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
GL ^(\d{4})$
GP ^((97|98)\d{3})$
GR ^(\d{5})$
GT ^(\d{5})$
GU ^(969\d{2})$
GW ^(\d{4})$
HN ^([A-Z]{2}\d{4})$
HR ^(?:HR)*(\d{5})$
HT ^(?:HT)*(\d{4})$
HU ^(\d{4})$
ID ^(\d{5})$
IL ^(\d{5})$
IM ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
IN ^(\d{6})$
IQ ^(\d{5})$
IR ^(\d{10})$
IS ^(\d{3})$
IT ^(\d{5})$
JE ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
JO ^(\d{5})$
JP ^(\d{7})$
KE ^(\d{5})$
KG ^(\d{6})$
KH ^(\d{5})$
KP ^(\d{6})$
KR ^(?:SEOUL)*(\d{6})$
KW ^(\d{5})$
KZ ^(\d{6})$
LA ^(\d{5})$
LB ^(\d{4}(\d{4})?)$
LI ^(\d{4})$
LK ^(\d{5})$
LR ^(\d{4})$
LS ^(\d{3})$
LT ^(?:LT)*(\d{5})$
LU ^(\d{4})$
LV ^(?:LV)*(\d{4})$
MA ^(\d{5})$
MC ^(\d{5})$
MD ^(?:MD)*(\d{4})$
ME ^(\d{5})$
MG ^(\d{3})$
MK ^(\d{4})$
MM ^(\d{5})$
MN ^(\d{6})$
MQ ^(\d{5})$
MT ^([A-Z]{3}\d{2}\d?)$
MV ^(\d{5})$
MX ^(\d{5})$
MY ^(\d{5})$
MZ ^(\d{4})$
NC ^(\d{5})$
NE ^(\d{4})$
NF ^(\d{4})$
NG ^(\d{6})$
NI ^(\d{7})$
NL ^(\d{4}[A-Z]{2})$
NO ^(\d{4})$
NP ^(\d{5})$
NZ ^(\d{4})$
OM ^(\d{3})$
PF ^((97|98)7\d{2})$
PG ^(\d{3})$
PH ^(\d{4})$
PK ^(\d{5})$
PL ^(\d{5})$
PM ^(97500)$
PR ^(\d{9})$
PT ^(\d{7})$
PW ^(96940)$
PY ^(\d{4})$
RE ^((97|98)(4|7|8)\d{2})$
RO ^(\d{6})$
RS ^(\d{6})$
RU ^(\d{6})$
SA ^(\d{5})$
SD ^(\d{5})$
SE ^(?:SE)*(\d{5})$
SG ^(\d{6})$
SH ^(STHL1ZZ)$
SI ^(?:SI)*(\d{4})$
SK ^(\d{5})$
SM ^(4789\d)$
SN ^(\d{5})$
SO ^([A-Z]{2}\d{5})$
SV ^(?:CP)*(\d{4})$
SZ ^([A-Z]\d{3})$
TC ^(TKCA 1ZZ)$
TH ^(\d{5})$
TJ ^(\d{6})$
TM ^(\d{6})$
TN ^(\d{4})$
TR ^(\d{5})$
TW ^(\d{5})$
UA ^(\d{5})$
US ^\d{5}(-\d{4})?$
UY ^(\d{5})$
UZ ^(\d{6})$
VA ^(\d{5})$
VE ^(\d{4})$
VI ^\d{5}(-\d{4})?$
VN ^(\d{6})$
WF ^(986\d{2})$
YT ^(\d{5})$
ZA ^(\d{4})$
ZM ^(\d{5})$
CS ^(\d{5})$
Hopefully I didn't make any mistake, my regex-fu is pretty weak.
If someone is still interested in how to validate zip codes I've found a solution:
Using Google Geocoding API
we can check validity of ZIP code having both Country code and a ZIP code itself.
For example I live in Ukraine so I can check like this: https://maps.googleapis.com/maps/api/geocode/json?components=postal_code:80380|country:UA
Or using JS API: https://developers.google.com/maps/documentation/javascript/geocoding#ComponentFiltering
Where 80380
is valid ZIP for Ukraine, actually every (#####) is valid.
Google returns ZERO_RESULTS
status if nothing found.
Or OK
and a result if both are correct.
Hope this will be helpful.
As others have pointed out, one regex to rule them all is unlikely. However, you can craft regular expressions for as many countries as you need using the address formatting info from the Universal Postal Union -- a little-known UN agency.
For example, here are the address formatting rules, including postal code, for a handful of countries (PDF format):
The problem is going to be that you probably have no good means of keeping up with the changing postal code requirements of countries on the other side of the globe and which you share no common languages. Unless you have a large enough budget to track this, you are almost certainly better off giving the responsibility of validating addresses to google or yahoo.
Both companies provide address lookup facuilities through a programmable API.
Given that there are so many edge cases for each country (eg. London addresses may use a slightly different format to the rest of the UK) I don't think that there is an ultimate regex other than maybe:
[0-9a-zA-Z]+
Best of going with a fairly broad pattern (well not quite as broad as the above), or treat each country/region with a specific pattern of its own!
UPDATE: However, it may be possible to dynamically construct a regex based upon lots of smaller, region specific rules - not sure about performance though!
Lots of country specific patterns can be found on the RegExLib site.
Why are you doing this and why do you care? As Tom Ritter pointed out, it doesn't matter whether you even have a ZIP/postal code at all, much less whether it's valid or not, until and unless you are actually going to be sending something to that address. Even if you expect that you will be sending them something someday, that doesn't mean you need a postal code today.
There are reasons beyond shipping for having an accurate postal code. Travel agencies doing tours that cross borders (Eurozone excepted of course) need this information ahead of time to give to the authorities. Often this information is entered by an agent that may or may not be familiar with such things. ANY method that can cut down on mistakes is a Good Idea™
However, writing a regex that would cover all postal codes in the world would be insane.
Somebody was asking about list of formatting mailing addresses, and I think this is what he was looking for...
Frank's Compulsive Guide to Postal Addresses: http://www.columbia.edu/~fdc/postal/ Doesn't help much with street-level issues, however.
My work uses a couple of tools to assist with this: - Lexis-Nexis services, including NCOA lookups (you'll get address standardization for "free") - "Melissa Data" http://www.melissadata.com
This is a very simple RegEx for validating US Zipcode (not ZipCode Plus Four):
(?!([089])\1{4})\d{5}
Seems all five digit numeric are valid zipcodes except 00000
, 88888
& 99999
.
I have tested this RegEx with http://regexpal.com/
SP