4
votes

I would like some help regarding a regular expression in Javascript.

I am trying to match any string that contains either only Basic Latin (ASCII) characters or only Greek Unicode characters. Not allowing strings with mixed characters from these two sets.

I have this regular expression which matches the exact opposite (all strings that contain at least one greek and one latin character), but cannot find a way to negate this:

https://regex101.com/r/JHzmhc/1

Thanks in advance.

2
Do you mean any ASCII and Greek?Wiktor Stribiżew
Yes, Basic Latin == ASCII, right?ktsangop
When you say "Latin", it sounds as if you want to match (or not match) letters. When you use \x00-\x7F, you match the whole ASCII table chars, thus, it is more appropriate to name those chars ASCII chars.Wiktor Stribiżew
Edited the question based on your suggestions, thank you.ktsangop
@WiktorStribiżew "The C0 Controls and Basic Latin block" would be the most formal and precise name, though Unicode documentation is littered with references to ASCII and 0-9 are called the ASCII Digits.Tom Blodget

2 Answers

3
votes

You may use

^(?:[\u0000-\u007F]+|[\u0370-\u03FF]+)$

See the regex demo

Details:

  • ^ - start of string
  • (?: - start of a non-capturing group (so that the anchors could be applied to both the altetnatives):
    • [\u0000-\u007F]+ - 1+ ASCII chars
    • | - or
    • [\u0370-\u03FF]+ - 1+ Greek chars
  • ) - end of group
  • $ - end of string.
2
votes

Wiktor’s solution has the correct general format. Unfortunately, matching Greek symbols isn’t as simple as [\u0370-\u03FF] — that way you miss out on many Greek symbols.

With Unicode property escapes in regular expressions, you’d do:

/^(?:[\0-\x7F\p]+|\p{Script_Extensions=Greek}+)$/u

Until Unicode property escapes are officially supported in ECMAScript and implemented everywhere, we can transpile this to:

/^(?:[\0-\x7F]+|(?:[\u0342\u0345\u0370-\u0373\u0375-\u0377\u037A-\u037D\u037F\u0384\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A\u1D5D-\u1D61\u1D66-\u1D6A\u1DBF-\u1DC1\u1F00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF\u1FF2-\u1FF4\u1FF6-\u1FFE\u2126\uAB65]|\uD800[\uDD40-\uDD8E\uDDA0]|\uD834[\uDE00-\uDE45])+)$/

Here’s the demo: https://regex101.com/r/cmNTLA/1