1
votes

I am fairly new to R and DPLYR and I am stuck on a this issue:

I have two tables:

(1) Repairs done on cars

enter image description here

(2) Amount owed on each car over time

enter image description here

What I would like to do is create three extra columns on the repair table that gives me: (1) the amount owed on the car when the repair was done, (2) 3months down the road and (3) finally last payment record on file.

And if the case where the repair date does not match with any payment record, I need to use the closest amount owed on record.

So something like:

enter image description here

Any ideas how I can do that?

Here are the data frames:

Repairs done on cars:

 df_repair <- data.frame(unique_id = 
 c("A1","A2","A3","A4","A5","A6","A7","A8"),
 car_number = c(1,1,1,2,2,2,3,3),
 repair_done = c("Front Fender","Front 
 Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front 
 Lights","Front Fender"),
 YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))


df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
                      YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),

amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
1
@akrun Thank you for your prompt response. there were some missing commas, it should work nowAli Parahoo

1 Answers

1
votes

Using zoo for year-months, and tidyverse, you could try the following. Using left_join add all the df_owed data to your df_repair data, by the car_number. You can convert your year-month columns to yearmon objects with zoo. Then, sort your rows by the year-month column from df_owed.

For each unique_id (using group_by) you can create your three columns of interest. The first will use the latest amount_owed where the owed date is prior to the service date. Then second (3 months) will use the first amount_owed value where the owed date follows the service date by 3 months (3/12). Finally, the most recent take just the last value from amount_owed.

Using the example data, the results differ a bit, possibly due to the data frames not matching the images in the post.

library(tidyverse)
library(zoo)

df_repair %>%
  left_join(df_owed, by = "car_number") %>%
  mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
  arrange(YearMonth.y) %>%
  group_by(unique_id, car_number) %>%
  summarise(
    owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
    owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
    owed_most_recent = last(amount_owed)
  )