12
votes

Update: As per comments regarding the ambiguity of my question, I've increased the detail in the question.

(Terminology: by words I am refering to any succession of alphanumerical characters.)

I'm looking for a regex to match the following, verbatim:

  • Words.
  • Words with one apostrophe at the beginning.
  • Words with any number of non-contiguous apostrophe throughout the middle.
  • Words with one apostrophe at the end.

I would like to match the following, however not verbatim, rather, removing the apostrophes:

  • Words with an apostrophe at the beginning and at the end would be matched to the word, without the apostrophes. So 'foo' would be matched to foo.
  • Words with more than one contiguous apostrophe in the middle would be resolved to two different words: the fragment before the contiguous apostrophes and the fragment after the contiguous apostrophes. So, foo''bar would be matched to foo and bar.
  • Words with more than one contiguous apostrophe at the beginning or at the end would be matched to the word, without the apostrophes. So, ''foo would be matched to foo and ''foo'' to foo.

Examples These would be matched verbatim:

  • 'bout
  • it's
  • persons'

But these would be ignored:

  • '
  • ''

And, for 'open', open would be matched.

5
The question is not well defined. How words are delimited? How many apostrophes are allowed in a word? What should be the correct output for the example: 'how'about'this'one'. Looking at the example of 'open', I doubt there is a solution without usage of lookahead/lookbehind. - Ilia K.
@Ilia K., what's wrong with using lookahead/lookbehind? - maček
I don't think you can have a single regex that matches 1 word, 2 words, 3 words, or N words. E.g., you could write something that works for 'foo', ''foo, bar'', foo'b, but not something that works for foo''bar, 'foo''bar', 'foo'bar'zim, at the same time. - maček

5 Answers

21
votes

Try using this:

(?=.*\w)^(\w|')+$

'bout     # pass
it's      # pass
persons'  # pass
'         # fail
''        # fail

Regex Explanation

NODE      EXPLANATION
  (?=       look ahead to see if there is:
    .*        any character except \n (0 or more times
              (matching the most amount possible))
    \w        word characters (a-z, A-Z, 0-9, _)
  )         end of look-ahead
  ^         the beginning of the string
  (         group and capture to \1 (1 or more times
            (matching the most amount possible)):
    \w        word characters (a-z, A-Z, 0-9, _)
   |         OR
    '         '\''
  )+        end of \1 (NOTE: because you're using a
            quantifier on this capture, only the LAST
            repetition of the captured pattern will be
            stored in \1)
  $         before an optional \n, and the end of the
            string
3
votes
/('\w+)|(\w+'\w+)|(\w+')|(\w+)/
  • '\w+ Matches a ' followed by one or more alpha characters, OR
  • \w+'\w+ Matche sone or more alpha characters followed by a ' followed by one or more alpha characters, OR
  • \w+' Matches one or more alpha characters followed by a '
  • \w+ Matches one or more alpha characters
1
votes

How about this?

'?\b[0-9A-Za-z']+\b'?

EDIT: the previous version doesn't include apostrophes on the sides.

0
votes

I submitted this 2nd answer coz it looks like the question has changed quite a bit and my previous answer is no longer valid. Anyway, if all conditions are listed up, try this:

(((?<!')')?\b[0-9A-Za-z]+\b('(?!'))?|\b[0-9A-Za-z]+('[0-9A-Za-z]+)*\b)
0
votes

This works fine

 ('*)(?:'')*('?(?:\w+'?)+\w+('\b|'?[^']))(\1)

on this data no problem

    'bou
    it's
    persons'
    'open'
    open
    foo''bar
    ''foo
    bee''
    ''foo''
    '
    ''

on this data you should strip result (remove spaces from matches)

    'bou it's persons' 'open' open foo''bar ''foo ''foo'' ' ''

(tested in The Regulator, results in $2)