2
votes

I have two dataframes (df1 and df2), and I want a new dataframe (df3) containing all rows where "date" AND "time_of_day" of df1 match with df2. And save the rows of df1 that don't match as well in a new dataframe (df4).

I tried using dplyr filter function, but it seems like I am not writing it correctly, as I am getting a new dataframe of the same length as df1 but it should show me only the matching rows based on both variables date and time of day.

> df1
          date time_of_day     
1  2018-06-03     morning 
2  2018-06-06     afternoon 
4  2018-06-09     morning 
5  2018-06-10     afternoon 

> df2
          date time_of_day     
1  2018-06-03     morning 
2  2018-06-06     morning 
3  2018-06-08     morning 
4  2018-06-09     morning 
5  2018-06-10     afternoon
6  2018-06-11     afternoon

#creating a new data frame
df3 <- filter(df1, date %in% df2$date & time_of_day %in% df2$time_of_day)
#another try 
df3 <- df1[df1$date %in% df2$date & df1$time_of_day %in% df2$time_of_day,]

This is what I want:

> df3
          date time_of_day     
1  2018-06-03     morning 
2  2018-06-09     morning 
3  2018-06-10     afternoon 

> df4
          date time_of_day     
1  2018-06-06     afternoon 
2
Try inner_join(df1, df2) and anti_join(df1, df2) - akrun

2 Answers

3
votes

We can do this with inner_join

library(dplyr)
df3 <- inner_join(df1, df2)
df3
#       date time_of_day
#1 2018-06-03     morning
#2 2018-06-09     morning
#3 2018-06-10   afternoon

and anti_join

df4 <- anti_join(df1, df2)
df4
#       date time_of_day
#1 2018-06-06   afternoon

data

df1 <- structure(list(date = c("2018-06-03", "2018-06-06", "2018-06-09", 
"2018-06-10"), time_of_day = c("morning", "afternoon", "morning", 
"afternoon")), class = "data.frame", row.names = c("1", "2", 
"4", "5"))

df2 <- structure(list(date = c("2018-06-03", "2018-06-06", "2018-06-08", 
"2018-06-09", "2018-06-10", "2018-06-11"), time_of_day = c("morning", 
"morning", "morning", "morning", "afternoon", "afternoon")),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))
2
votes

Altering your base R code, you could do this (below). And you can wrap either in unique() if you want duplicate rows removed.

df1[paste0(df1$date, df1$time_of_day) %in% paste0(df2$date, df2$time_of_day), ]
        date time_of_day
1 2018-06-03     morning
4 2018-06-09     morning
5 2018-06-10   afternoon

and

df1[!paste0(df1$date, df1$time_of_day) %in% paste0(df2$date, df2$time_of_day), ]
        date time_of_day
2 2018-06-06   afternoon

Your attempts before did not work because df1$date %in% df2$date & df1$time_of_day %in% df2$time_of_day evaluates to TRUE TRUE TRUE TRUE. So it retained all rows. That is to say: all dates in df1 are in df2 & all times of day in df1 are in df2.

EDIT:

Alternatively, in dplyr you could use intersect and setdiff which work with dataframes and remove duplicates:

dplyr::intersect(df1, df2)
        date time_of_day
1 2018-06-03     morning
2 2018-06-09     morning
3 2018-06-10   afternoon

dplyr::setdiff(df1, df2)
        date time_of_day
1 2018-06-06   afternoon