
I have the following string:

one two three four five six seven eight nine

And I am trying to construct a regular expression that groups the string into three groupings:

  1. Group 1: 'one two three'
  2. Group 2: 'four five six'
  3. Group 3: 'seven eight nine'

I have tried variations of (.*\b(one|two|three)?)(.*\b(four|five|six)?)(.*\b(seven|eight|nine)?) but this pattern splits the full match into one group that contains the full string - the demo can be found here.

Trying (.*\b(one|two|three))(.*\b(four|five|six))(.*\b(seven|eight|nine)) seems to get me closer to what I want but the match information panel shows that the pattern identifies two matches each containing six capture groups.

I am using the OR statement because the groups can be of any length, e.g. two three four, applying the pattern to this string should identify two groups -

  1. Group 1: 'two'
  2. Group 2: 'three four'.

3 Answers


A large regex that probably does it



Readable version

      .* \b 
        |  two
        |  three
        |  four
        |  five
        |  six
        |  seven
        |  eight
        |  nine
 (                             # (1 start)
      (?: one | two | three )

           (?: one | two | three )
 )?                            # (1 end)

 (                             # (2 start)
      (?: four | five | six )

           (?: four | five | six )
 )?                            # (2 end)

 (                             # (3 start)
      (?: seven | eight | nine )

           (?: seven | eight | nine )
 )?                            # (3 end)

This answer assumes that you want to find groups of three number words at a time:

x <- c("one two three four five six seven eight nine")
regexp <- gregexpr("\\S+(?:\\s+\\S+){2}", x)
regmatches(x, regexp)[[1]]

[1] "one two three"    "four five six"    "seven eight nine"

If you want a more general solution, which doesn't require knowing a priori what the length of the input is (i.e. how many groups of three are present), then you might have to use an iterative approach:

parts <- strsplit(x, " ")[[1]]
output <- character(0)
for (i in seq(from=1, to=length(parts), by=3)) {
    output <- c(output, paste(parts[i], parts[i+1], parts[i+2]))

[1] "one two three"    "four five six"    "seven eight nine"

I'm not quite sure what your desired output might be. However, this expression passes and creates several separate capturing groups to be simple to call:


enter image description here


If this expression wasn't desired, you can modify/change your expressions in regex101.com.

RegEx Circuit

You can also visualize your expressions in jex.im:

enter image description here

JavaScript Demo

This snippet shows that what various capturing groups might return:

const regex = /((one|two|three)\s.*?)((four|five|six)\s.*?)((seven|eight|nine)\s.*)/gm;
const str = `one two three four five six seven eight nine

two three four six seven eight`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);