So here's my problem: I have raw data of daily interest rate for the year 2010 to 2019. However, there are several dates that are missing.
1244 9-Jul-10 5.053
1245 8-Jul-10 5.007
1246 7-Jul-10 4.991
1247 6-Jul-10 4.976
1248 28-Jun-10 4.850
1249 21-Jun-10 4.900
1250 18-Jun-10 5.000
1251 14-Jun-10 3.800
1252 9-Jun-10 3.850
1253 1-Jun-10 3.950
1254 31-May-10 3.950
When I import the data on R, it displays 1254 data which is the amount of data that I actually have.
interest <-read.csv("C:/Users/SOOGRIM/Desktop/Interest4.csv",header=TRUE,stringsAsFactors=FALSE)
interest Date Price
1 21-Jan-19 3.550 2 20-Jan-19 3.550 3 19-Jan-19 3.550 4 18-Jan-19 3.550 5 17-Jan-19 3.630 summary(interest) Date Price X
Length:1254 Min. :0.861 Min. : 1.000
Class :character 1st Qu.:2.400 1st Qu.: 1.000
Mode :character Median :2.900 Median : 2.000
Mean :3.000 Mean : 3.031
3rd Qu.:3.670 3rd Qu.: 6.000
Max. :5.674 Max. :10.000
NA's :1222**
However, on converting it to time series, it interpolates the data for the missing dates and results in a total of 3281.
interest.ts <-ts(data=interest$Price,frequency=365,start=c(2010,06),end=c(2019,01))
summary(interest.ts) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.861 2.450 2.900 3.001 3.680 5.674 length(interest.ts) [1] 3281**
This affects my forecast of the interest rate.
I want to be able to identify the missing date in my daily values and replace it automatically in R. I have investigated into the package imputeTS
and lubridate
. I don't know what function to use in order to renumber the date and display the NaN for "price" variable respectively.
Then I will use the correct interpolation method which is available in the imputeTS package to interpolate the values for the "price" variable.
Simply I just want to know how to add the missing date automatically in R. As there are over 1000 date missing, it's tedious to do it manually on excel.
is.na
? – G5Wis.na
oris.nan
– Kevin Cazelles