1
votes

I have 2 data frames of unequal size and df1 has a comma separated values.

df1 <- data.frame(col1=c("id1", "id2", "id3", "id4", "id5", "id6", "id7"),
                  col2=c("1,2", "2", "1,2,3", "1,2,3,4,5", "4", "2", "1"))

Output of df1
-----------------
  col1      col2
1  id1       1,2
2  id2         2
3  id3     1,2,3
4  id4 1,2,3,4,5
5  id5         4
6  id6         2
7  id7         1


df2 <- data.frame(col1=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16),
                  col2=c("id1", "id1", "id1", "id1", "id1", "id1", "id2", "id3", "id3", "id4", "id4", "id4", "id5", "id6", "id7", "id7"), 
                  col3=c(1, 1, 1, 2, 3, 4, 2, 4, 2, 1, 4, 5, 1, 1, 2, 3))

df2$newCol <- 0

Output of df2
---------------
   col1 col2 col3 newCol
1     1  id1    1      0
2     2  id1    1      0
3     3  id1    1      0
4     4  id1    2      0
5     5  id1    3      0
6     6  id1    4      0
7     7  id2    2      0
8     8  id3    4      0
9     9  id3    2      0
10   10  id4    1      0
11   11  id4    4      0
12   12  id4    5      0
13   13  id5    1      0
14   14  id6    1      0
15   15  id7    2      0
16   16  id7    3      0

I would like to update 'newCol' rows with 1 if following condition satisfies values in df2$col2 == df1$col1 and df2$col3 %in% df2$col2 and the rest should be 0.

Here is the output I expect

    col1 col2 col3 newCol
1     1  id1    1      1
2     2  id1    1      1
3     3  id1    1      1
4     4  id1    2      1
5     5  id1    3      0
6     6  id1    4      0
7     7  id2    2      1
8     8  id3    4      0
9     9  id3    2      1
10   10  id4    1      1
11   11  id4    4      1
12   12  id4    5      1
13   13  id5    1      0
14   14  id6    1      0
15   15  id7    2      0
16   16  id7    3      0

I tried to use unnest df1 values into its own rows and use merge but it does not seem to be working as expected.

Any suggestion to achieve this efficiently would be appreciated.

1

1 Answers

2
votes

You could use grepl to detect whether the vector is in col2. This will be after mergeing the data

transform(merge(df2,df1,by.x="col2",by.y="col1"),newcol = +Vectorize(grepl)(col3,col2.y))[-4]

   col2 col1 col3 newcol
1   id1    1    1      1
2   id1    2    1      1
3   id1    3    1      1
4   id1    4    2      1
5   id1    5    3      0
6   id1    6    4      0
7   id2    7    2      1
8   id3    8    4      0
9   id3    9    2      1
10  id4   10    1      1
11  id4   11    4      1
12  id4   12    5      1
13  id5   13    1      0
14  id6   14    1      0
15  id7   15    2      0
16  id7   16    3      0