3
votes

Let's say I have data consisting of the time I leave the house and the number of minutes it takes me to get to work. I'll have some repeated values:

08:00, 20
08:04, 25
08:30, 40
08:20, 23
08:04, 22

And some numbers will repeat (like 08:04). What I want to do is a run a scatter plot that is correctly scaled at the x-axis but allows these multiple values per entry so that I could view the trend.

Is a time-series even what I want to be using? I've been able to plot a time series graph that has one value per time, and I've gotten multiple values plotted but without the time-series scaling. Can anyone suggest a good approach? Preference for ggplot2 but I'll take standard R plotting if it's easier.

1

1 Answers

5
votes

First lets prepare some more data

set.seed(123)
df <- data.frame(Time = paste0("08:", sample(35:55, 40, replace = TRUE)), 
                 Length = sample(20:50, 40, replace = TRUE), 
                 stringsAsFactors = FALSE)
df <- df[order(df$Time), ]
df$Attempt <- unlist(sapply(rle(df$Time)$lengths, function(i) 1:i))
df$Time <- as.POSIXct(df$Time, format = "%H:%M") # Fixing y axis
head(df)
    Time Length Attempt
6  08:35     24       1
18 08:35     43       2
35 08:35     34       3
15 08:37     37       1
30 08:38     33       1
38 08:39     38       1

As I understand, you want to preserve the order of observations of the same leaving house time. At first I ignored that and got a scatter plot like this:

enter image description here

ggplot(data = df, aes(x = Length, y = Time)) + 
  geom_point(aes(size = Length, colour = Length)) + 
  geom_path(aes(group = Time, colour = Length), alpha = I(1/3)) + 
  scale_size(range = c(2, 7)) + theme(legend.position = 'none')

but considering three dimensions (Time, Length and Attempt) scatter plot no longer can show us all the information. I hope I understood you correctly and this is what you are looking for:

enter image description here

ggplot(data = df, aes(y = Time, x = Attempt)) + geom_tile(aes(fill = Length))