1
votes

I am new to time series and was hoping someone could provide some input/ideas here.

I am trying to find ways to impute missing values.
I was hoping to find the moving average, but most of the packages (smooth, mgcv, etc.) don't seem to take time intervals into consideration.
For example, the dataset might look like something below and I would want value at 2016-01-10 to have the greatest influence in calculating the missing value:

Date          Value    Diff_Days
2016-01-01    10       13           
2016-01-10    14       4
2016-01-14    NA       0
2016-01-28    30       14
2016-01-30    50       16

I have instances where NA might be the first observation or the last observation. Sometimes NA values also occur multiple times, at which point the rolling window would need to expand, and this is why I would like to use the moving average.
Is there a package that would take date intervals / separate weights into consideration?
Or please suggest if there is a better way to impute NA values in such cases.

1
Is your time series required to be smoother? If there is no new observation you could substitute them with the last observed value (e. g. for Date 2016-01-14 the value 14), as there is no real information gain/news/innovation in the time series. - Numb3rs
I would like to get the weighted average rather than imputing with its previous value. It's also possible that previous observation was recorded in 2016-01 and the next value was recorded at 2016-08, then 2016-09, etc. - creativename

1 Answers

1
votes

You can use glm or any different model.

Input

con <- textConnection("Date          Value    Diff_Days
2015-12-14    NA       0
                      2016-01-01    10       13           
                      2016-01-10    14       4
                      2016-01-14    NA       0
                      2016-01-28    30       14
                      2016-02-14    NA       0
                      2016-02-18    NA       0
                      2016-02-29    50       16")

df <- read.table(con, header = T)
df$Date <- as.Date(df$Date)

df$Date.numeric <- as.numeric(df$Date)
fit <- glm(Value ~ Date.numeric, data = df)

df.na <- df[is.na(df$Value),]

predicted <- predict(fit, df.na)
df$Value[is.na(df$Value)] <- predicted

plot(df$Date, df$Value)
points(df.na$Date, predicted, type = "p", col="red")

df$Date.numeric <- NULL
rm(df.na)
print(df)

Output

enter image description here

        Date     Value Diff_Days
1 2015-12-14 -3.054184         0
2 2016-01-01 10.000000        13
3 2016-01-10 14.000000         4
4 2016-01-14 18.518983         0
5 2016-01-28 30.000000        14
6 2016-02-14 40.092149         0
7 2016-02-18 42.875783         0
8 2016-02-29 50.000000        16