how to use regexp to return small part of character based on pattern

Question

This should be easy for anyone who understands regular expressions as I'm struggling to do.

I have a vector of strings that looks like

strings<-c("jklsflk fKASJLJ (LN/WEC/WPS); jsdfjDFSDKTdfkls jfdjk kdkd(LN/WEC/WPS)",
"PEARYMP PEARYVIRGN_16 1 (LN/MP/MP)",
"08VERMLN XF03 08VERMLN_345_3 (XF/CIN/*)")

I want to convert this vector into a dataframe where each row is from an element of the original vector with 3 columns where each column comes from the part in parenthesis. So the result here would be

col1        col2       col3
"LN"        "WEC"      "WPS"
"LN"        "MP"       "MP"
"XF"        "CIN"      "*"

If there are more than one instance of the pattern in a string then it should take the first instance.

I think my main problem is that ( is a special character and I'm trying to escape it \( but I get an error that \( is an unrecognized escape character so I'm just a little lost.

eddi eddi · Accepted Answer · 2014-04-14T19:19:00

Sounds like you're forgetting to escape the \ in \(, i.e. \\(:

do.call(rbind, strsplit(sub('.*?\\((.*?)\\).*', '\\1', strings), split = "/"))
     [,1] [,2]  [,3] 
[1,] "LN" "WEC" "WPS"
[2,] "LN" "MP"  "MP" 
[3,] "XF" "CIN" "*"

how to use regexp to return small part of character based on pattern

2 Answers