0
votes

Let's say I have a data frame with lots of values under these headers:

df <- data.frame(c("Tid", "Value"))
#Tid.format = %Y-%m-%d %H:%M

Then I turn that data frame over to zoo, because I want to handle it as a time series:

library("zoo")
df <- zoo(df$Value, df$Tid)

Now I want to produce a smooth scatter plot over which time of day each measurement was taken (i.e. discard date information and only keep time) which supposedly should be done something like this: https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html

But it seems the time() function doesn't produce any time at all; instead it just produces a number sequence. Whatever I do from that link, I can't get a scatter plot of values over an average day. The data.frame code that actually does work (without using zoo time series) looks like this (i.e. extracting the hour from the time and converting it to numeric):

smoothScatter(data.frame(as.numeric(format(df$Tid,"%H")),df$Value)

Another thing I want to do is produce a density plot of how many measurements I have per hour. I have plotted on hours using a regular data.frame with no problems, so the data I have is fine. But when I try to do it using zoo then I either get errors or I get the wrong results when trying what I have found through Google.

I did manage to get something plotted through this line:

plot(density(as.numeric(trunc(time(df),"01:00:00"))))

But it is not correct. It seems again that it is just producing a sequence from 1 to 217, where I wanted it to be truncating any date information and just keep the time rounded off to hours.

I am able to plot this:

plot(density(df))

Which produces a density plot of the Values. But I want a density plot over how many values were recorded per hour of the day.

So, if someone could please help me sort this out, that would be great. In short, what I want to do is:

1) smoothScatter(x-axis: time of day (0-24), y-axis: value)

2) plot(density(x-axis: time of day (0-24)))

EDIT:

library("zoo")
df <- data.frame(Tid=strptime(c("2011-01-14 12:00:00","2011-01-31 07:00:00","2011-02-05 09:36:00","2011-02-27 10:19:00"),"%Y-%m-%d %H:%M"),Values=c(50,52,51,52))
df <- zoo(df$Values,df$Tid)
summary(df)
df.hr <- aggregate(df, trunc(df, "hours"), mean)
summary(df.hr)
png("temp.png")
plot(df.hr)
dev.off()

This code is some actual values that I have. I would have expected the plot of "df.hr" to be an hourly average, but instead I get some weird new index that is not time at all...

1
The code in the question did not define the times (second argument to zoo was omitted) so it assumed 1:nrow(df) as the times. The zoo object you want is zoo(df$value, df$Tid) or read.zoo(df) .G. Grothendieck
Aha, I was loading the data incorrectly. Thanks. But I still can't get a satisfactory result. If I run df.hr <- aggregate(df, trunc(df, "01:00:00"), mean) then I just get fifteen values like 99 90 87 88 89 91 92 86 85 84 83 78.60000 80.20000 81.23333 82.62500 83.30000 84.51818 85.35000 86.52353 87.46316 88.52162 89.50435 82 81 80 78 90.36047 91.20000 92.10000 99.90000 which is nothing like the hours within a day...GaRyu
You will need to provide something reproducible. It can't really be answered in its current form.G. Grothendieck
I added a code snippet at the end that might illustrate my problem. The index that I get after truncating on hours isn't related to time, so the plot just looks way off...GaRyu

1 Answers

0
votes

There are three problems with the aggregate statement in the question:

  1. We wish to truncate the times not df.

  2. trunc.POSIXt unfortunately returns a POSIXlt result so it needs to be converted back to POSIXct

  3. It seems you did not intend to truncate to the hour in the first place but wanted to extract the hours.

To address the first two points the aggregate statement needs to be changed to:

tt <- as.POSIXct(trunc(time(df), "hours"))
aggregate(df, tt, mean)

but to address the last point it needs to be changed entirely to

tt <- as.POSIXlt(time(df))$hour
aggregate(df, tt, mean)