1
votes

Data frames sg is as following:

           time                B   C D
1  2014-08-04 00:00:04.0       red 0 0
2  2014-08-04 00:00:06.0       red 0 0
3  2014-08-04 00:00:06.0       red 1 0
4  2014-08-04 00:00:06.2       red 0 0
5  2014-08-04 00:00:06.5       red 0 0
6  2014-08-04 00:00:07.0       red 0 1
7  2014-08-04 00:00:07.7       red 0 0
8  2014-08-04 00:00:16.0       red 0 0
9  2014-08-04 00:00:17.0       red 1 0
10 2014-08-04 00:00:18.0       red 0 0
11 2014-08-04 00:00:22.0       red 0 0
12 2014-08-04 00:00:22.0       red 0 0
13 2014-08-04 00:00:22.2       red 0 0
14 2014-08-04 00:00:25.0       red 1 0
15 2014-08-04 00:00:27.0       red 1 0
16 2014-08-04 00:00:28.0       red 0 0
17 2014-08-04 00:00:29.0 red/amber 1 0
18 2014-08-04 00:00:29.0 red/amber 1 1
19 2014-08-04 00:00:30.0     green 0 0
20 2014-08-04 00:00:40.0     green 0 1
21 2014-08-04 00:00:42.4     green 0 0
22 2014-08-04 00:00:43.0     green 0 0
23 2014-08-04 00:00:50.0       red 1 0
24 2014-08-04 00:00:51.2       red 0 0
25 2014-08-04 00:00:52.0       red 0 1
26 2014-08-04 00:00:52.0       red 1 0
27 2014-08-04 00:00:52.2       red 1 0
28 2014-08-04 00:00:52.9       red 1 1
29 2014-08-04 00:00:53.0       red 0 0
30 2014-08-04 00:00:59.0       red 0 1
31 2014-08-04 00:01:02.0       red 0 1
32 2014-08-04 00:01:03.2       red 0 1
33 2014-08-04 00:01:04.0       red 1 1
34 2014-08-04 00:01:06.4       red 0 1
35 2014-08-04 00:01:07.5       red 1 1
36 2014-08-04 00:01:08.0       red 0 1
37 2014-08-04 00:01:08.2       red 0 1
38 2014-08-04 00:01:08.4       red 0 1
39 2014-08-04 00:01:11.0       red 0 1
40 2014-08-04 00:01:13.0       red 0 1
41 2014-08-04 00:01:14.0       red 0 1
42 2014-08-04 00:01:15.0 red/amber 0 1
43 2014-08-04 00:01:15.0 red/amber 0 1
44 2014-08-04 00:01:16.0     green 0 1
45 2014-08-04 00:01:21.0     green 0 0
46 2014-08-04 00:01:26.0     green 0 0
47 2014-08-04 00:01:31.0     amber 0 0
48 2014-08-04 00:01:31.0     amber 0 0
49 2014-08-04 00:01:34.0       red 0 0
50 2014-08-04 00:01:36.0       red 0 0

First, I need to split the data frame into groups by time interval(for example 10 seconds). Second, calculate the percentage of value "1" in each group for columns C and D separately. Finally, plot the percentage for column C and B with time in a graphic.

I did it for single variable. My solution is :

percentage.occupied <- function(x) (NROW(subset(x,C==1)))/(NROW(x))

splitbytime <- ddply(selectstatus309, .(cut(time,"10 seconds")),percentage.occupied)
colnames(splitbytime)<-c("time","occupancy")

occupancy  <- ggplot(splitbytime, aes(x=(as.POSIXct(splitbytime$time)),y=occupancy)) +
                      geom_point(shape=1) +
                      geom_smooth()+
                      xlab("time") +
                      ylab("% occupancy") 

The graphic is looks like the following pic, I plot it for column C. What I need is to plot the percentage for C and D respectively in one graphic.

I am not sure if I describe my question clear (┬_┬)

enter image description here

I took BrodieG's solution and apply it to a period time(1 hour) of my data. I followed each step but plot something wrong: enter image description here Besides, there is an error:

geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in smooth.construct.cr.smooth.spec(object, data, knots) : 
  x has insufficient unique values to support 10 knots: reduce k.

I guess the error is not the reason for the strange plot. You can see there is one part of the melted df as following, from which I refer the result is impossible to be just 1 or 0.

 time          B variable value
10520 2014-08-04 15:10:00      green     dt_5     0
10521 2014-08-04 15:10:00      green     dt_5     0
10522 2014-08-04 15:10:00      green     dt_5     0
10523 2014-08-04 15:10:00      green     dt_5     0
10524 2014-08-04 15:10:00      green     dt_5     0
10525 2014-08-04 15:10:00      green     dt_5     0
10526 2014-08-04 15:10:00      green     dt_5     0
10527 2014-08-04 15:10:00      green     dt_5     0
10528 2014-08-04 15:10:00      green     dt_5     1
10529 2014-08-04 15:10:00      amber     dt_5     1
10530 2014-08-04 15:10:00      amber     dt_5     1
10531 2014-08-04 15:10:00      amber     dt_5     1
10532 2014-08-04 15:10:00      amber     dt_5     1
10533 2014-08-04 15:10:00      amber     dt_5     1
10534 2014-08-04 15:10:00      amber     dt_5     1
10535 2014-08-04 15:10:00      amber     dt_5     0
10536 2014-08-04 15:10:00      amber     dt_5     0
10537 2014-08-04 15:10:00      amber     dt_5     0
10538 2014-08-04 15:10:00      amber     dt_5     0
10539 2014-08-04 15:10:00      amber     dt_5     0
10540 2014-08-04 15:10:00      amber     dt_5     0
10541 2014-08-04 15:10:00        red     dt_5     0
10542 2014-08-04 15:10:00        red     dt_5     0
10543 2014-08-04 15:10:00        red     dt_5     0
10544 2014-08-04 15:10:00        red     dt_5     0
10545 2014-08-04 15:10:00        red     dt_5     0

The code is here:

selectstatus309.mlt <- melt(selectstatus309,id.var=c("time","B"))

percentage<-
  ggplot(selectstatus309.mlt, aes(x=time,y=value,color=variable))+
  stat_summary(geom="point", fun.y =mean,shape=1)+
  stat_smooth()+
  facet_wrap(~ B)

Sorry for the looooong and verbose story! T。T

1
percentage.occupied <- function(x) (NROW(subset(x,C==1)))/(NROW(x)) == mean(x)Vlo

1 Answers

2
votes

Here is an option. First we make our cut time data:

library(reshape2)
library(ggplot2)
df$time <- as.POSIXct(cut(as.POSIXct(df$time), "10 secs"))

Then we melt it so the values in C and D are in the same column so we can use that as an aesthetic. This is the key step to have the two plots in the same graphic as you want. Inspect df.mlt to see how it is different from df. ggplot likes data in long format to use it's built-in data segmentation tools.

df.mlt <- melt(df, id.var=c("time", "B"))

Then we use stat_summary to plot the dots (no need to resort to ddply):

ggplot(df.mlt, aes(x=time, y=value, color=variable)) + 
  stat_summary(geom="point", fun.y=mean, shape=1) + 
  stat_smooth()

produces (on your subset of data):

enter image description here

Note how I'm able to split out the data by whether it is "C" or "D". You can even facet by B:

ggplot(df.mlt, aes(x=time, y=value, color=variable)) + 
  stat_summary(geom="point", fun.y=mean, shape=1) + 
  stat_smooth() +
  facet_wrap(~ B)

enter image description here