Split and clean multiple length strings in a column to multiple columns using R script

Question

I have data in the following format:

test1 <- data.frame(value = c('25.5 (5%);  39.65 (23%)', '28.15(5%) and 55.66 (34%) and 33.26   (14%)', '45   56.9565', '95.6666 (55%)  89.2343(90%)   51.56 (28%)'))
test2 <- data.frame(value = c('36.5', '55.658', '47.8', '51.562'))

I need to split the values in column test1 into three columns (col1, col2 and col3) and then compare and highlight the value in the column (test2) that is within +/- 0.1 of the value in one of the three columns (col1, col2 and col3) as shown in the image below.

Please suggest on how to proceed with this.

col1    col2    col3    test2
25.5    39.65           36.5
28.15   55.66   33.26   **55.658**
45      56.9565         47.8
95.6666 89.2343 51.56   **51.562**

string split (and cleaned) into columns

akrun akrun · Accepted Answer · 2016-11-10T03:13:17

We can use gsub with read.table to extract the 'value' column into three columns

df1 <- read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+", "", test1$value), 
                    header=FALSE, fill=TRUE, col.names = paste0("col", 1:3))

and cbind it with the 'test2'

df2 <- cbind(df1, test2)
df2
#    col1    col2  col3  value
#1 25.5000 39.6500    NA   36.5
#2 28.1500 55.6600 33.26 55.658
#3 45.0000 56.9565    NA   47.8
#4 95.6666 89.2343 51.56 51.562

Update

With the new data

cbind(read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+|[;,]\\s*", "", 
   test1$value), header=FALSE, fill=TRUE, col.names = paste0("col", 1:3)), test2)
#    col1    col2  col3  value
#1 25.5000 39.6500    NA   36.5
#2 28.1500 55.6600 33.26 55.658
#3 45.0000 56.9565    NA   47.8
#4 95.6666 89.2343 51.56 51.562

Split and clean multiple length strings in a column to multiple columns using R script

1 Answers

Update