I have a CSV file which contains lines for each (Java GC) Event I am interested in. The object consists of a subsecond timestamp (non equidistant) and some variables. The object looks like this:
gcdata <- read.table("http://bernd.eckenfels.net/view/gc1001.ygc.csv",header=TRUE,sep=",", dec=".")
start = as.POSIXct(strptime("2012-01-01 00:00:00", format="%Y-%m-%d %H:%M:%S"))
gcdata.date = gcdata$Timestamp + start
gcdata = gcdata[,2:7] # remove old date col
gcdata=data.frame(date=gcdata.date,gcdata)
str(gcdata)
Results in
'data.frame': 2997 obs. of 7 variables:
$ date : POSIXct, format: "2012-01-01 00:00:06" "2012-01-01 00:00:06" "2012-01-01 00:00:18" ...
$ Distance.s. : num 0 0.165 11.289 9.029 11.161 ...
$ YGUsedBefore.K.: int 1610619 20140726 20148325 20213304 20310849 20404772 20561918 21115577 21479211 21544930 ...
$ YGUsedAfter.K. : int 7990 15589 80568 178113 272036 429182 982841 1346475 1412181 1355412 ...
$ Promoted.K. : int 0 0 0 0 8226 937 65429 71166 62548 143638 ...
$ YGCapacity.K. : int 22649280 22649280 22649280 22649280 22649280 22649280 22649280 22649280 22649280 22649280 ...
$ Pause.s. : num 0.0379 0.022 0.0287 0.0509 0.109 ...
In this case I care about the Pause time (in seconds). I want to plot a diagram, which will show me for each (wall clock) hour basically the mean as a line, the 2% and 98% as a grey corridor and the max value (within each hour) as a red line.
I have done some work, but using the q98 functions is ugly, having to use multiple lines statements seems to be wastefull, and I dont know how to achieve a grey area between q02 and q98:
q02 <- function(x, ...) { x <- quantile(x,probs=c(0.2)) }
q98 <- function(x, ...) { x <- quantile(x,probs=c(0.98)) }
hours = droplevels(cut(gcdata$date, breaks="hours")) # can I have 2 hours?
plot(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=max),ylim=c(0,2), col="red", ylab="Pause(s)", xlab="Days") # Is always black?
lines(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=q98),ylim=c(0,2), col="green")
lines(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=q02),ylim=c(0,2), col="green")
lines(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=mean),ylim=c(0,2), col="blue")
Now this results in a chart which has black dots as maximum, a blue line as the hourly average and a lower and upper 0,2 + 0,98 green line. I think it would be better readable to have a grey corridor, maybe a dashed maximum (red) line and somehow fix the axis labels.
Any suggestions? (the file is available above)