0
votes

I've got two data frames - Employee punch data and employee names data:

DF1

    punch_out punch_in date employee_number
 1  16:00:00  06:00:00 2018-01-01 00000001
 2  15:00:00  08:00:00 2018-08-01 00000001

DF2

employee_numb  job_title  start_date end_date
00000001        worker    2017-08-05 2018-07-01
00000001        manager   2018-07-01 3000-01-01

I need to join them so that in DF1 I would have a new column - "Job title" which will properly reflect the actual job title, according to the date.

My struggles are around the date conditions. So from the examples above: according to the example dates observation 1 should have job title "worker", but observation two has to have "manager".

If I do traditional join - it duplicates the records and I will have two rows for each DF1 row and employee 00000001 on 2018-01-01 will be both worker and manager.

The result should look like this

    punch_out punch_in date employee_number Job Title
 1  16:00:00  06:00:00 2018-01-01 00000001  worker
 2  15:00:00  08:00:00 2018-08-01 00000001  manager
2
Questions to the SO tag should have reproducible examples. As it is we don't really know what you have. To do that show the output of dput(DF1) and dput(DF2) in the body of the question. - G. Grothendieck

2 Answers

3
votes

The sqldf package is one option here, which lets us phrase dataframe joins using SQL syntax:

library(sqldf)

sql <- "SELECT Df1.punch_out, Df1.punch_in, Df1.date, Df1.employee_number, Df2.job_title
        FROM Df1
        LEFT JOIN Df2
            ON Df1.date BETWEEN Df2.start_date AND Df2.end_date"
result <- sqldf(sql)
0
votes

Could also do:

library(data.table)

setkey(setDT(DF2)[, start_date := as.Date(start_date)], employee_numb, start_date)
setkey(setDT(DF1)[, date := as.Date(date)], employee_number, date)

DF2[DF1, roll = T, .(punch_out, punch_in, employee_number, job_title)]

If your columns are already dates, you can just do:

setkey(setDT(DF2), employee_numb, start_date)
setkey(setDT(DF1), employee_number, date)

DF2[DF1, roll = T, .(punch_out, punch_in, employee_number, job_title)]

The data used:

DF2 <- structure(list(employee_numb = c("00000001", "00000001"), job_title = structure(2:1, .Label = c("manager", 
                                                                                                       "worker"), class = "factor"), start_date = structure(c(17383, 
                                                                                                                                                              17713), class = "Date"), end_date = structure(1:2, .Label = c("2018-07-01", 
                                                                                                                                                                                                                            "3000-01-01"), class = "factor")), row.names = c(NA, -2L), class = "data.frame")

DF1 <- structure(list(punch_out = structure(2:1, .Label = c("15:00:00", 
                                                            "16:00:00"), class = "factor"), punch_in = structure(1:2, .Label = c("06:00:00", 
                                                                                                                                 "08:00:00"), class = "factor"), date = structure(c(17532, 17744
                                                                                                                                 ), class = "Date"), employee_number = c("00000001", "00000001"
                                                                                                                                 )), row.names = c(NA, -2L), class = "data.frame")