1
votes

I have a monthly time series with data of consecutive months, i.e. for some month there is missing data. How can I fill in some appropriate values between the starting and ending date for this case. Please note I don't set specific data range because the starting and ending date is just decided by the Date column retrived from a datatable. For example, My data is like

Date={2016-3-1, 2016-8-1, 2016-9-1, 2017-3-1,2017-6-1).
Price={111,122,124,142,134}

My expected output is

Date={2016-3-1,2016-4-1,2016-5-1,2016-6-1,.......2017-6-1}, 
Price={111,112,113......134}

(here I just fill in some dummy numbers, could anyone pls suggest what is the best way to fill in the numbers here).

Many thanks!!

1
Hard to answer without specifics. Is simple linear interpolation good enough? - user2974951
It is a modeling for the price trend of certain product, the purpose is to reflect the true trend for missing months. I'm not sure what is a good way to fill the data. Any comments or advice is welcome! Thanks! - Cherry

1 Answers

1
votes

If you don't care whether the interpolated data have to be integers or not, you could do something like this:

df <- data.frame(Date=as.Date(c('2016-3-1', '2016-8-1', '2016-9-1', '2017-3-1','2017-6-1'), format='%Y-%m-%d'), 
                 Price=c(111,122,124,142,134))

This is your current data. You can then extract the first and last dates to create a full range of dates between these two:

firstDate <- head(df$Date, 1)
lastDate <- tail(df$Date, 1)
allDates <- data.frame(Date = seq.Date(firstDate, lastDate, by = 'month'))

Then you merge the original data with this set of all dates:

fulldf <- merge(df, allDates, by = 'Date', all = TRUE)

Note that NAs appear against the dates that have no original data values against them.

Now you can use the stinepack library, for example, to interpolate the missing data. The Stineman algorithm is said to be less prone to oscillations than splines, for example.

library(stinepack)
fulldf$Price <- na.stinterp(fulldf$Price, along = fulldf$Date)

Note that the interpolated data are no longer integers. You can round them to the nearest integer, if you like.