I have 2 dataframes.
df1-
col1 col2 col3 col4 col5 name1 A 23 x y name1 A 29 x y name1 B 17 x y name1 A 77 x y
df2-
col1 col2 col3 B 17 LL1 Z 193 KK1 A 77 LO9 Y 80 LK2
I want to return those rows from df1 if col2 and col3 of df1 are not equal to col1 and col2 of df2.
The output should be-
col1 col2 col3 col4 col5 name1 A 23 x y name1 A 29 x y
Solution I found-
unique.rows <- function (df1, df2) {
out <- NULL
for (i in 1:nrow(df1)) {
found <- FALSE
for (j in 1:nrow(df2)) {
if (all(df1[i,2:3] == df2[j,1:2])) {
found <- TRUE
break
}
}
if (!found) out <- rbind(out, df1[i,])
}
out
}
This solution is working fine but initially, I was applying for small dataframes. Now my df1 has about 10k rows and df2 has about 7 million rows. It is just running and running from last 2 days. Could anyone please suggest a fast way to do this?