0
votes

I want to compare two data frames in R with same column names (df1 & df2). Based on the values in each of the columns in one of them (df2) I want to filter the other one (df1). I need to eliminate rows in df1 that are greater or equal than the values in df2 with respect to each column name. In other words, in need to produce res1 below:

df1 <- data.frame( v1 = c(1,2,3,4), v2 = c(2, 10, 5, 11), v3=c(20, 25, 23, 2), v4=c(1,2,1,3) )  

> df1
  v1 v2 v3 v4
1  1  2 20  1
2  2 10 25  2
3  3  5 23  1
4  4 11  2  3

df2 <- data.frame(v1 = 4, v2 = 10, v3 =30, v4 = 3)

> df2
  v1 v2 v3 v4
1  4 10 30 3

So, the desired output res1 is generated by comparing each row in df1 with df2 based on column names and eliminating the rows in df1 that are greater or equal than specific column threshold defined in df2:

> res1
  v1 v2 v3 v4
1  1  2 20  1
2  3  5 23  1 
3
df2 would always be one row dataframe? What if it has multiple rows? Should we compare it with each row? - Ronak Shah
@RonakShah It is always one row data frame. In df2 I defined threshold values for deleting rows in df1. - Makaroni

3 Answers

3
votes

We can use mapply with < sign to compare the two data frames, and use rowSums to index for subseting, i.e.

df1[rowSums(mapply(`<`, df1, df2)) == ncol(df1),]
#  v1 v2 v3 v4
#1  1  2 20  1
#3  3  5 23  1

Additionally, a fully Vectorized translation of the above can be (compliments of @RonakShah),

df1[rowSums(df1 < df2[rep(1, nrow(df1)), ]) == ncol(df1), ]
2
votes

We can use apply row-wise and check if all the elements in the row are less than the one in other dataframe

df1[t(apply(df1, 1, function(x) all(x < df2[1, ]))), ]

#  v1 v2 v3 v4
#1  1  2 20  1
#3  3  5 23  1
1
votes

Here is another option using Reduce with Map

df1[Reduce(`&`, Map(`<`, df1, df2)),]
#   v1 v2 v3 v4
#1  1  2 20  1
#3  3  5 23  1

Or using tidyverse

library(dplyr)
library(purrr)
map2(df1, df2, `<`) %>% 
       reduce(`&`) %>% 
       df1[.,]
#   v1 v2 v3 v4
#1  1  2 20  1
#3  3  5 23  1