0
votes

I have a CSV file containing data as follows-

date, group, integer_value

The date starts from 01-January-2013 to 31-October-2015 for the 20 groups contained in the data.

I want to create a time series for the 20 different groups. But the dates are not continuous and have sporadic gaps in it, hence-

group4series <- ts(group4, frequency = 365.25, start = c(2013,1,1))

works from programming point of view but is not correct due to gaps in data.

How can I use the 'date' column of the data to create the time series instead of the usual 'frequency' parameter of 'ts()' function?

Thanks!

1
Please post a representative sample of your data with dput. - Maurits Evers

1 Answers

2
votes

You could use zoo::zoo instead of ts.

Since you don't provide sample data, let's generate daily data, and remove some days to introduce "gaps".

set.seed(2018)
dates <- seq(as.Date("2015/12/01"), as.Date("2016/07/01"), by = "1 day")
dates <- dates[sample(length(dates), 100)]

We construct a sample data.frame

df <- data.frame(
    dates = dates,
    val = cumsum(runif(length(dates))))

To turn df into a zoo timeseries, you can do the following

library(zoo)
ts <- with(df, zoo(val, dates))

Let's plot the timeseries

plot.zoo(ts)

enter image description here