I think there are basically three easy ways of extracting multiple capture groups in R (without using substitution); str_match_all
, str_extract_all
, and regmatches/gregexpr
combo.
I like @kohske's regex, which looks behind for an open parenthesis ?<=\\(
, looks ahead for a closing parenthesis ?=\\)
, and grabs everything in the middle (lazily) .+?
, in other words (?<=\\().+?(?=\\))
Using the same regex:
str_match_all
returns the answer as a matrix.
str_match_all(j, "(?<=\\().+?(?=\\))")
[,1]
[1,] "wonder"
[2,] "groan"
[3,] "Laugh"
# Subset the matrix like this....
str_match_all(j, "(?<=\\().+?(?=\\))")[[1]][,1]
[1] "wonder" "groan" "Laugh"
str_extract_all
returns the answer as a list.
str_extract_all(j, "(?<=\\().+?(?=\\))")
[[1]]
[1] "wonder" "groan" "Laugh"
#Subset the list...
str_extract_all(j, "(?<=\\().+?(?=\\))")[[1]]
[1] "wonder" "groan" "Laugh"
regmatches/gregexpr
also returns the answer as a list. Since this is a base R option, some people prefer it. Note the recommended perl = TRUE
.
regmatches(j, gregexpr( "(?<=\\().+?(?=\\))", j, perl = T))
[[1]]
[1] "wonder" "groan" "Laugh"
#Subset the list...
regmatches(j, gregexpr( "(?<=\\().+?(?=\\))", j, perl = T))[[1]]
[1] "wonder" "groan" "Laugh"
Hopefully, the SO community will correct/edit this answer if I've mischaracterized the most popular options.