Cannot add dates of zero demand to zoo time series due to duplicate dates

Question

My apologies in advance for any non-compliance with the rules of posting a question. The data table below is a sample of what I want to convert into a time series.

> Materials
MaterialID  Date     Quantity
   1      2011-01-04      13
   1      2011-01-04      5
   2      2011-01-07      9
   3      2011-01-09      3
   3      2011-01-11      10

It consists of transaction entries for several Material items between 2011- 2014.The date range for the entire data set is from 4th Jan, 2011 - 31st Dec 2014. I want to create a transaction entry for each material within this period while accounting for the missing dates by setting the Quantity variable to zero for the missing dates. To put it another way the outcome I desire is that there will be an entry for each Material in the data set for every date between 4th Jan, 2011 - 31st Dec 2014 as shown below:

   Date    MaterialID_1  MaterialID_2 MaterialID_3
2011-01-04    13               0          0
2011-01-04    5                0          0
2011-01-05    0                0          0
2011-01-06    0                0          0
2011-01-07    0                9          0
2011-01-08    0                0          0
2011-01-09    0                0          3
2011-01-10    0                0          10
2011-01-11    0                0          0
    .         .                .          .
    .         .                .          .
    .         .                .          .
2014-12-31    0                0          0

I have tried some of the methods I have seen in the forum such as Add months of zero demand to zoo time series, but because I have duplicated dates I get the error, "index entries in ‘order.by’ are not unique". I'd appreciate any advise or help I can get with this.

After getting the data into this format, my intention is to reshape the data set to do batch forecasting. Thanks.

See dput code below:

dput(Data)
structure(list(MaterialID = c(1L, 1L, 2L, 3L, 1L), Date = c("2011-01-04", 
"2011-01-04", "2011-01-07", "2011-01-09", "2011-01-11"), Quantity = c(13L, 
5L, 9L, 3L, 10L)), .Names = c("MaterialID", "Date", "Quantity"
), class = "data.frame", row.names = c(NA, -5L))

Do not use images to display input data. If DF is the 9 rows shown show the output of dput(DF) in your question and also show what the expected output is. If the output would be too long change your question so it is not so long. Read minimal reproducible example . — G. Grothendieck
@G.Grothendieck. Thanks for the guidance. Still learning the ropes but I will make certain that my future posts and examples are more compliant with what is expected here. — Udy
@G.Grothendieck. Yes, this will be useful in such a situation. Right now, I just want to use the data to make a 12 month forecast. Thanks! — Udy
@G.Grothendieck I have tried to fix the question. Hope it looks better now? — Udy

Dirk Nachbar Dirk Nachbar · Accepted Answer · 2016-11-15T14:26:27

I am using expand.grid to get all combinations and then use a merge(). I am using random data here

df <- data.frame(materialid = rpois(10, 3), date = as.Date(seq(1, 365 * 4, length.out = 10), origin = '2011-01-01'), quantity = rpois(10, 100))

df2 <- expand.grid(unique(df$materialid), as.Date(min(df$date):max(df$date), origin = '1970-01-01'))
names(df2) <- c('materialid', 'date')

df2 <- merge(df2, df, by = c('materialid', 'date'), all.x = T)
df2$quantity[is.na(df2$quantity)] <- 0
summary(df2)

Cannot add dates of zero demand to zoo time series due to duplicate dates

2 Answers