Imagine a R data frame with numerous string columns that contain a chain of numerals (scientific notation) between some characters in each row. Here a simplified example:
df <- data.frame(id = 1:3,
vec1 = c("[a-4.16121967e-02 b4.51207198e-02 c-7.89282843e-02 d4.02516453e-03]",
"[a-7.52146867e-02 b3.78264938e-02 c-1.03749274e-02 d4.02516453e-03]",
"[a-2.13926377e-02 b9.27949827e-02 c-5.89836483e-02 d2.44455224e-03]"),
vec2 = c("[a-4.16121967e-02 b4.51207198e-02 c-7.89282843e-02 d4.02516453e-03]",
"[a-7.40210414e-02 b1.75862815e-02 c-1.03749274e-02 d4.02516453e-03]",
"[a-6.73705637e-02 b9.27949827e-02 c-8.35041553e-02 d2.44455224e-03]"))
I'm looking for a fast (the data frame I'm working with contains a lot more and much larger vectors) solution (preferably dplyr) that converts the vector columns into lists with numerical vectors for each row.
So far I managed to remove the unnecessary characters and separate the vector elements by commas like this:
mutate(df,
vec1 = str_replace_all(vec1, "\\[|\\]|a|b|c|d", ""),
vec1 = str_replace_all(vec1, " ", ","),
vec2 = str_replace_all(vec2, "\\[|\\]|a|b|c|d", ""),
vec2 = str_replace_all(vec2, " ", ","))
Maybe there's a better and more elegant solution for this step. While we're at it: I actually wonder how to do this with mutate_at() and starts_with("vec") in order to fix all my columns at once.
More importantly, I'm struggling with the conversion to numeric vectors resulting in 2 list columns with one numerical vector with 4 elements in each row and column. I only managed to extract and convert single vectors like this:
as.numeric(unlist(strsplit(df[1,'vec1'], ",")))
However, I'd like to avoid a loop through all the vectors. Any help is highly appreciated.