3
votes

I have a data frame like DF below which will be imported directly from the database (as tibble).

library(tidyverse)
library(lubridate)


date_until <- dmy("31.05.2019")
date_val  <- dmy("30.06.2018")

DF <-  data.frame( date_bal   = as.Date(c("2018-04-30", "2018-05-31", "2018-06-30", "2018-05-31", "2018-06-30")),
                   department = c("A","A","A","B","B"),
                   amount     = c(10,20,30,40,50)
)

DF <- DF %>%
  as_tibble()
DF

It represents the amount of money spent by each department in a specific month. My task is to project how much money will be spent by each department in the following months until a specified date in the future (in this case date_until=31.05.2019)

I would like to use tidyverse in order to generate additional rows for each department where the first column date_bal would be a sequence of dates from the last one from "original" DF up until date_until which is predefined. Then I would like to add additional column called "DIFF" which would represent the difference between DATE_BAL and DATE_VAL, where DATE_VAL is also predefined. My final result would look like this: Final result

I have managed to do this in the following way:

  1. first filter data from DF for department A
  2. Create another DF2 by populating it with date sequence from min(dat_bal) to date_until from 1.
  3. Merge data frames from 1. and 2. and then add calculated columns using mutate

Since I will have to repeat this procedure for many departments I wonder if it's possible to add rows (create date sequence) in existing DF (without creating a second DF and then merging).

Thanks in advance for your help and time.

1

1 Answers

2
votes

I add one day to the dates, create a sequence and then rollback to the last day of the previous month.

seq(min(date_val + days(1)), date_until + days(1), by = 'months')[-1] %>% 
  rollback() %>% 
  tibble(date_bal = .) %>% 
  crossing(DF %>% distinct(department)) %>% 
  bind_rows(DF %>% select(date_bal, department)) %>% 
  left_join(DF) %>% 
  arrange(department, date_bal) %>% 
  mutate(
    amount = if_else(is.na(amount), 0, amount),
    DIFF = interval(
      rollback(date_val, roll_to_first = TRUE), 
      rollback(date_bal, roll_to_first = TRUE)) %/% months(1)
  )