1
votes

Given the following regex:

^((?:\d+\s)?\w+(?:\s\w+)?)

which is working well to extract the first (up to) 2 words (preceded by a number, if there is one), how can I adapt it to included words that are hyphenated?

I tried adding \- after the first w+, but that only found the first half of the hyphenated word and broke the original functionality.

Some examples of valid matches are:

  • 1 Two
  • 3 Four Five
  • Six-Seven-Eight
  • Nine Ten

They are components of an address field and therefore, I suppose, might have an apostrophe somewhere too, how could I also check for that?

1
I don't think you need to escape a - outside of [ ].user1486147
Try \-* because if xou leave out this quantifier, you only match hyphenated wordsdas_weezul

1 Answers

2
votes

Use [\s\-] instead of \s

[\s\-] would either match a space or -..

So it should be

^((?:\d+[\s\-])?\w+(?:[\s\-]\w+)?)

The above regex wont work for your valid matches..you should use the regex given below


A better way to match multiple words seperated by - or space would be

^\w+([\s\-]\w+){0,2}$