I'm trying to create a histogram from time-series data in R, similar to this question. Each bin should show the total duration for the values falling within the bin. I have non-integer sample times in an zoo object of thousands of rows. The timestamps are irregular, and the data is assumed to be constant between each timestamp (sample-and-hold).
Example data:
library(zoo)
library(ggplot2)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5", "2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3", "2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
ggplot(x.df, aes(x = Date, y = Value)) + geom_step() + scale_x_datetime(labels = date_format("%H:%M:%OS"))
Please see the times-series plot here. Creating a histogram with hist(z, freq = T)
does not care about the timestamps: Plot from hist method.
My desired output is a histogram with duration in seconds on the y-axis, something like this: Histogram with non-integer duration on y-axis.
Edit:
I should point out that the data values are not integers, and that i want to be able to control the bin width(s). I could use diff(timestamp)
to create a (non-integer) column showing duration for each point, and plotting a bar graph like suggested by @MKR:
x.df = data.frame(DurationSecs = as.numeric(diff(timestamp)), Value = data[-length(data)])
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_bar(stat = "identity")
This gives a histogram with the right bar heights for the example. But this fails when the values are floating point numbers.