1
votes

I have the dataset below with half hourly timeseries data.

Date <- c("2018-01-01 08:00:00", "2018-01-01 08:30:00", 
          "2018-01-01 08:59:59","2018-01-01 09:29:59")
Volume <- c(195, 188, 345, 123)
Dataset <- data.frame(Date, Volume)

I would like to know how to read this dataframe in order to conduct time series analysis. How should I define starting and ending date and what the frequency will be?

2

2 Answers

2
votes

I'm not sure what you exactly mean by "half hour data" since it isn't. In case you want to round it to half hours, we can adapt this solution to your case.

Dataset$Date <- as.POSIXlt(round(as.double(Dataset$Date)/(30*60))*(30*60),
                       origin=(as.POSIXlt('1970-01-01')))

In case you don't want to round it just do

Dataset$Date <- as.POSIXct(Dataset$Date)

Basically your Date column should be formatted to a date format, e.g. "POSIXlt" so that e.g.:

> class(Dataset$Date)
[1] "POSIXlt" "POSIXt" 

Then we can convert the data into time series with xts.

library(xts)
Dataset.xts <- xts(Dataset$Volume, order.by=Dataset$Date)

Result (rounded case):

> Dataset.xts
[,1]
2018-01-01 08:00:00  195
2018-01-01 08:30:00  188
2018-01-01 09:00:00  345
2018-01-01 09:30:00  123
1
votes

you can use dplyr and lubridate from tidyverse to get the data into a POSIX date format, then convert to time series with ts. Within that you can define parameters.

Dataset2 <- Dataset %>%
  mutate(Date = as.character(Date),
         Date = ymd_hms(Date)) %>% 
  ts(start = c(2018, 1), end = c(2018, 2), frequency = 1)

try ?ts for more details on the parameters. Personally I think zoo and xts provide a better framework for time series analysis.