Missing Values in Raw data

Question

So here's my problem: I have raw data of daily interest rate for the year 2010 to 2019. However, there are several dates that are missing.

1244 9-Jul-10 5.053 1245 8-Jul-10 5.007 1246 7-Jul-10 4.991 1247 6-Jul-10 4.976 1248 28-Jun-10 4.850 1249 21-Jun-10 4.900 1250 18-Jun-10 5.000 1251 14-Jun-10 3.800 1252 9-Jun-10 3.850 1253 1-Jun-10 3.950 1254 31-May-10 3.950

When I import the data on R, it displays 1254 data which is the amount of data that I actually have.

interest <-read.csv("C:/Users/SOOGRIM/Desktop/Interest4.csv",header=TRUE,stringsAsFactors=FALSE)

interest Date Price
1 21-Jan-19 3.550 2 20-Jan-19 3.550 3 19-Jan-19 3.550 4 18-Jan-19 3.550 5 17-Jan-19 3.630 summary(interest) Date Price X
Length:1254 Min. :0.861 Min. : 1.000
Class :character 1st Qu.:2.400 1st Qu.: 1.000
Mode :character Median :2.900 Median : 2.000
Mean :3.000 Mean : 3.031
3rd Qu.:3.670 3rd Qu.: 6.000
Max. :5.674 Max. :10.000
NA's :1222**

However, on converting it to time series, it interpolates the data for the missing dates and results in a total of 3281.

interest.ts <-ts(data=interest$Price,frequency=365,start=c(2010,06),end=c(2019,01))

summary(interest.ts) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.861 2.450 2.900 3.001 3.680 5.674 length(interest.ts) [1] 3281**

This affects my forecast of the interest rate.

I want to be able to identify the missing date in my daily values and replace it automatically in R. I have investigated into the package imputeTS and lubridate. I don't know what function to use in order to renumber the date and display the NaN for "price" variable respectively.

Then I will use the correct interpolation method which is available in the imputeTS package to interpolate the values for the "price" variable.

Simply I just want to know how to add the missing date automatically in R. As there are over 1000 date missing, it's tedious to do it manually on excel.

It is very difficult to understand your problem without an example : stackoverflow.com/questions/5963269/… — Bulat

Steffen Moritz Steffen Moritz · Accepted Answer · 2019-11-16T00:15:39

Did not understand your problem completely - but I think this is a problem of implicit missing values.

You have a time series and some dates are missing completely - but not in a way that they are marked as NA, the dates are just not given at all in the time series. (thus the NA values are only implicitly given)

You can solve this with a imputeTS and tsibble package combination.

library(imputeTS)
library(tsibble)

# Convert your time series or data.frame into a tsibble time series object
x <- as_tsibble(your_timeseries)

# Get the implicit missing values -afterwards you have the missing values as NA
x <- fill_gaps(x)

# Perform the time series imputation
x <- na.kalman(x)

Here the tsibble package is used for adding the implicit missing values as actual NA values. Afterwards imputeTS is used to perform the time series imputation (replacing the NA values).

If you only need simple imputation like e.g. a mean you can also do this with fill_gaps directly. Otherwise use some function of imputeTS (e.g. na.kalman, na.interpolation, na.seadec, na.ma)

Missing Values in Raw data

1 Answers