Extracting values from a long string and create new columns based on the number of brackets in r

Question

substr() might be a great way to extract values with conditions (in our case, the condition is extracting values from brackets), but is there any handy way to extract multiple of them and create multiple columns (new columns number is the same as the number of extracted values).

Here is one example data:

        index  abc
  1     1      qwer(urt123) qweqwe
  2     2      rte(ret390) qweqwe(tertr213) ityorty(ret435)
  3     3      <NA>
  4     4      ogi(wqe685) qwe(ieow123)
  5     5      cvb(bnm567)

code for creating the question data frame:

data.frame(index = c(1:5),
           abc = c("qwer(urt123) qweqwe", "rte(ret390) qweqwe(tertr213) ityorty(ret435)",
                    NA, "ogi(wqe685) qwe(ieow123)", "cvb(bnm567)"))

Final results:

        index  abc                                          abc1     abc2     abc3
  1     1      qwer(urt123) qweqwe                          urt123   <NA>     <NA>
  2     2      rte(ret390) qweqwe(tertr213) ityorty(ret435) ret390   tertr213 ret435
  3     3      <NA>                                         <NA>     <NA>     <NA>
  4     4      ogi(wqe685) qwe(ieow123)                     wqe685   ieow123  <NA>
  5     5      cvb(bnm567)                                  bnm567   <NA>     <NA>

The original data set has more than 10,000 lines and the number of brackets in the abc column could be more or less than 3.

ThomasIsCoding ThomasIsCoding · Accepted Answer · 2020-02-21T21:34:24

Here is a base R solution

dfout <- cbind(df,
               gsub("\\(|\\)",
                    "",
                    do.call(rbind,
                            lapply(z <- with(df,regmatches(abc,gregexpr("\\(\\w+\\)",abc))),
                                   `length<-`,
                                   max(lengths(z))))))

such that

> dfout
  index                                          abc      1        2      3
1     1                          qwer(urt123) qweqwe urt123     <NA>   <NA>
2     2 rte(ret390) qweqwe(tertr213) ityorty(ret435) ret390 tertr213 ret435
3     3                                         <NA>   <NA>     <NA>   <NA>
4     4                     ogi(wqe685) qwe(ieow123) wqe685  ieow123   <NA>
5     5                                  cvb(bnm567) bnm567     <NA>   <NA>

Extracting values from a long string and create new columns based on the number of brackets in r

2 Answers