0
votes

I have a data frame with a POSIXct column and a data column (in this case rain as example). Currently the POSIXct time series is irregular (i.e. sometimes it is every second sometimes every three or two seconds). In the real dataset the rain column makes more sense here I just created a random bunch of numbers.

Setting up a regularly interval df as example

df  <- data.frame(Time = seq.POSIXt(from = as.POSIXct("2018-09-04 
10:56:12"), to = as.POSIXct("2018-09-04 10:57:12"), by = "sec"), rain = 
rnorm(61,2,3))

Sampling df so we have an irregular time series which is what actual data looks like

df <- data.frame(Time = df[sample(nrow(df),30),])

However, I need to have a regularly spaced time series (every 2 seconds). Perhaps by taking the starting and ending POSIXct data points and then interpolating the data from from the rain column corresponding to the POSIXct intervals that have been created.

So essentially, if we were to start on 10:56:12 the next time step would be at 10:56:14. The script would ideally check if 14 exists and if not would create a suitable time stamp as well as interpolate between the two nearest rain data points.

In this hypothetical example the 10:56:14 didn't exist in the df. It has now been created and the rain datapoint was interpolated given the two nearest neighbours (average).

Time                rain
2018-09-04 10:56:12 1.309069
2018-09-04 10:56:14 2.731635            
2018-09-04 10:56:16 4.154202

I hope this is sufficiently clear. Let me know if you need any further info.

1
Looks like you need complete with seq from first and last time points - akrun

1 Answers

0
votes

This aproximation can be good:

First I created a dataframe with all the timestamps in the desired interval.

start  = as.POSIXct("2018-09-04 10:56:12")
end    = as.POSIXct("2018-09-04 10:57:12")
step = 3 #seconds (can be 2 seconds too!)

dummy.df  <- seq(start, end, step) 
dummy.df  <- as.data.frame(dummy.df)
colnames(dummy.df) <- c("time")

It looks like this:

                  time
1  2018-09-04 10:56:12
2  2018-09-04 10:56:15
3  2018-09-04 10:56:18
4  2018-09-04 10:56:21
5  2018-09-04 10:56:24
6  2018-09-04 10:56:27
7  2018-09-04 10:56:30
8  2018-09-04 10:56:33
9  2018-09-04 10:56:36
10 2018-09-04 10:56:39
11 2018-09-04 10:56:42
12 2018-09-04 10:56:45
13 2018-09-04 10:56:48
14 2018-09-04 10:56:51
15 2018-09-04 10:56:54
16 2018-09-04 10:56:57
17 2018-09-04 10:57:00
18 2018-09-04 10:57:03
19 2018-09-04 10:57:06
20 2018-09-04 10:57:09
21 2018-09-04 10:57:12

Then I created a broken dataframe to simulate your problem:

#Lets say this is our original broken df, (with loses)
original.df <- dummy.df
original.df$V2 <- rnorm(nrow(original.df), 2,3) #Some fake data
original.df <- original.df[-c(5,6,10,15,16), ] #Introduce lost observations
colnames(original.df) <- c("time", "rain")

Looks like this

                  time       rain
1  2018-09-04 10:56:12  2.0962818
2  2018-09-04 10:56:15 -4.6536048
3  2018-09-04 10:56:18  6.2995354
4  2018-09-04 10:56:21 -0.2378457
7  2018-09-04 10:56:30  3.8020177
8  2018-09-04 10:56:33 -5.7051945
9  2018-09-04 10:56:36  8.8737618
11 2018-09-04 10:56:42  3.9543532
12 2018-09-04 10:56:45 -3.8187967
13 2018-09-04 10:56:48  2.6197517
14 2018-09-04 10:56:51  8.3323757
17 2018-09-04 10:57:00  0.5615195
18 2018-09-04 10:57:03  1.4699787
19 2018-09-04 10:57:06 11.1171002
20 2018-09-04 10:57:09  0.7372780
21 2018-09-04 10:57:12  0.3441350

Next I merged the dataframes:

#Merge 
df.merged<-merge(original.df, dummy.df, by = "time", all = T)
colnames(df.merged) <- c("time", "rain")

And I get:

                  time       rain
1  2018-09-04 10:56:12  0.1243183
2  2018-09-04 10:56:15 -1.4254425
3  2018-09-04 10:56:18  3.1908882
4  2018-09-04 10:56:21  2.5210609
5  2018-09-04 10:56:24         NA
6  2018-09-04 10:56:27         NA
7  2018-09-04 10:56:30 -1.7984547
8  2018-09-04 10:56:33  1.0511725
9  2018-09-04 10:56:36  2.2908755
10 2018-09-04 10:56:39         NA
11 2018-09-04 10:56:42 -1.1524531
12 2018-09-04 10:56:45  1.2960848
13 2018-09-04 10:56:48  4.3809223
14 2018-09-04 10:56:51  0.1184004
15 2018-09-04 10:56:54         NA
16 2018-09-04 10:56:57         NA
17 2018-09-04 10:57:00 -3.3740135
18 2018-09-04 10:57:03 -7.1360918
19 2018-09-04 10:57:06  1.7078348
20 2018-09-04 10:57:09 -1.8903507
21 2018-09-04 10:57:12  2.5735402

Finally I interpolate with a linear method using na.approx :

#Interpolate NAs
df.merged$rain<-na.approx(df.merged$rain, method = "linear", rule=2)

And this is the desired output:

                  time       rain
1  2018-09-04 10:56:12  0.1243183
2  2018-09-04 10:56:15 -1.4254425
3  2018-09-04 10:56:18  3.1908882
4  2018-09-04 10:56:21  2.5210609
5  2018-09-04 10:56:24  1.0812224
6  2018-09-04 10:56:27 -0.3586162
7  2018-09-04 10:56:30 -1.7984547
8  2018-09-04 10:56:33  1.0511725
9  2018-09-04 10:56:36  2.2908755
10 2018-09-04 10:56:39  0.5692112
11 2018-09-04 10:56:42 -1.1524531
12 2018-09-04 10:56:45  1.2960848
13 2018-09-04 10:56:48  4.3809223
14 2018-09-04 10:56:51  0.1184004
15 2018-09-04 10:56:54 -1.0457376
16 2018-09-04 10:56:57 -2.2098755
17 2018-09-04 10:57:00 -3.3740135
18 2018-09-04 10:57:03 -7.1360918
19 2018-09-04 10:57:06  1.7078348
20 2018-09-04 10:57:09 -1.8903507
21 2018-09-04 10:57:12  2.5735402