0
votes

I have a long time series 'obs' with 1 hour time step (class="zoo").There were some missing values which has already been removed, so the time step is not consistent anymore

> head(obs)
               time obs   
2009-12-22 01:00:00 23.708
2009-12-22 02:00:00 23.708
2009-12-22 03:00:00 23.708
2009-12-22 04:00:00 23.708
2009-12-22 06:00:00 23.708
2009-12-22 07:00:00 23.708

> tail(obs)
               time obs 
2013-09-22 21:00:00 45.031
2013-09-22 22:00:00 45.031
2013-09-22 23:00:00 41.589
2013-09-23 00:00:00 28.987
2013-09-23 01:00:00 22.238
2013-09-23 02:00:00 20.533

Now from this time series I want to create multiple time series with a time step of 12 hours starting from each hours. so in total there should be 12 time series. one of the expected output is given below (which starts at 01:00:00)

               time obs
2009-12-22 01:00:00 23.708
2009-12-22 13:00:00 23.708
2009-12-23 01:00:00 23.708
2009-12-23 13:00:00 24.136
2009-12-24 01:00:00 23.708
2009-12-24 13:00:00 23.708
....

Like this I need to create other time series (starts from 02:00:00, 03:00:00 and so on) with 12 hour time step. If the time step is consistent I can transfrom every 12 hour data in rows and then it would be much easier to extract it from each column. But it's not possible now. How can I do it? I am already using xts package. But I couldn't find a way.

3
Did any of the answers help you?majom

3 Answers

1
votes

xts is the right package. What you are interested in is the function

[.xts (Extract subsets of xts Objects)

For example:

obs["T01:00/T01:59"]

will return all the observation where the "T" time is between 01:00 and 01:59.

You just need to vectorize, and putting all together you could get something similar to this:

my_func <- function(i, obs){
   if(i > 9){ 
      hours <- paste("T", i, ":00/T", i, ":59", sep = "") 
   }else{
      hours <- paste("T0", i, ":00/T0", i, ":59", sep = "") 
   }   
   hours.12 <- paste("T", i + 12, ":00/T", i + 12, ":59", sep = "") 
   #
   obs.subset <- rbind(obs[hours], obs[hours.12])
}
# get a list of 12 subsets as requested
obs.subsetted <- lapply(0:11, my_func, obs)
1
votes

Here is a solution using data.table and lubridate.

The entire code snippet takes less than 0.01 seconds on my laptop.

# Load packages
library(lubridate)
library(data.table)

# Set up data
time <- seq(ymd_hms("2009-12-22 01:00:00"), ymd_hms("2013-09-23 02:00:00"), by="1 hour")
obs <- abs(rnorm(length(time)))
dt <- data.table(time, obs)

# Set up a list where all 12 output data tables are stored
l <- vector(12, mode="list")

# Split original data
for (i in 0:11){
  l[[i+1]] <- dt[seq(from=i+1, to=nrow(dt), by=12)]
}

The output data looks like this:

> l
[[1]]
                     time        obs
   1: 2009-12-22 01:00:00 1.14244266
   2: 2009-12-22 13:00:00 1.13037973
   3: 2009-12-23 01:00:00 0.18268572
   4: 2009-12-23 13:00:00 0.56539405
   5: 2009-12-24 01:00:00 0.06480253
  ---                               
2739: 2013-09-21 01:00:00 1.06874026
2740: 2013-09-21 13:00:00 0.04367871
2741: 2013-09-22 01:00:00 0.43790836
2742: 2013-09-22 13:00:00 1.41966787
2743: 2013-09-23 01:00:00 0.68687465

[[2]]
                     time       obs
   1: 2009-12-22 02:00:00 1.6789682
   2: 2009-12-22 14:00:00 0.1321111
   3: 2009-12-23 02:00:00 2.5129179
   4: 2009-12-23 14:00:00 0.9818898
   5: 2009-12-24 02:00:00 0.6617939
  ---                              
2739: 2013-09-21 02:00:00 0.6028943
2740: 2013-09-21 14:00:00 0.4571396
2741: 2013-09-22 02:00:00 0.7017483
2742: 2013-09-22 14:00:00 0.1206088
2743: 2013-09-23 02:00:00 0.3864518

[[3]]
                     time        obs
   1: 2009-12-22 03:00:00 2.14461926
   2: 2009-12-22 15:00:00 0.68896644
   3: 2009-12-23 03:00:00 0.19332982
   4: 2009-12-23 15:00:00 1.09463684
   5: 2009-12-24 03:00:00 0.60102308
  ---                               
2738: 2013-09-20 15:00:00 0.36922591
2739: 2013-09-21 03:00:00 0.89973806
2740: 2013-09-21 15:00:00 0.02761852
2741: 2013-09-22 03:00:00 0.17313669
2742: 2013-09-22 15:00:00 0.61018630

[[4]]
...
1
votes

After searching for so long, I found this straight forward method from xts package

 obs[.indexhour(x) %in% c(t1,t2)]

this extracts all observation of t1 and t2 hours on each day. For more details try ?indexClass in xts package