1
votes

I have a question in a google-form for which I want to set a response validation to match "in 25 words or fewer".

The regex I've tried is ^(\b.+){1,25}$ but that isn't working: more than 25 words in one paragraph is validating, and 2 ten word paragraphs is invalidating.

I do want to allow multiple lines/paragraphs because people are people and they'll just get confused if it were not allowed.

These should pass:

  • one two three? four five six, seven eight nine ten!
  • one two three? four five six, seven eight nine ten! one two three? four five six, seven eight nine ten!
  • here are twenty five simple words in three separate paragraphs.
    one two three? four five six, seven eight nine ten!
    one two three? four five
  • !bang !bang here are words starting with a non-word character
  • here is a sentence ending in a word character
  • here is a sentence ending in a non-word character!

These should fail:

  • one two three? four five six, seven eight nine ten! one two three? four five six, seven eight nine ten! one two three? four five six, seven eight nine ten!
  • one two three? four five six, seven eight nine ten!
    one two three? four five six, seven eight nine ten!
    one two three? four five six, seven eight nine ten!

Suggestions?

2
Try ^(\b\B+){1,25}$. And could you provide your sample lines?Daniel
@MondKin GoogleForms says that's not a valid regex.Erics
^(\S+\s*){1,25}$ might be worth a tryTripWire
Also tried ^\w+(?:\W+\w+){0,25}$ from stackoverflow.com/questions/43889293/… but that fails if the text ends with non-whitespace characters.Erics
So is the problem only with multi-line strings? Check in google-form if they have a special syntax to specify if $ should match at the end of each line, or just the end of the whole text.Daniel

2 Answers

3
votes

You're looking for

/^(?:\s*\S+(?:\s+\S+){0,24})?\s*$/

which avoids catastrophic backtracking by always matching exactly one whole word in the repetition. It's (\s+\S+){0,25} with the first repetition factored out to allow any whitespace, including none, (*) instead of at least one (+).

You could also use the easier to read (\s*\S+){0,25} with a negative lookahead to ensure matching whole words:

/^(?:\s*\S+(?!\S)){0,25}\s*$/

Alternatively, possessive quantifiers ({0,25}+) are the best solution if your regex engine supports them.

And of course you can swap out \s/\S for \W/\w if you desire, and then also use a word boundary instead of the lookahead:

/^(?:\W*\w+\b){0,25}\W*$/
0
votes

Assuming ^ and $ are ok:

^(([^\s]+)\s?){1,25}$

it looks like the trailing \s? was triggering the catastrophic backtracking, rewriting without that makes it a bit longer as the first word and the next 24 are matched separately:

^[^\s]+(\s([^\s]+)){0,24}\s?$

(the \s pattern matches whitespace)