Adding points to plot using ggplot2

Question

Here is the first 9 rows (out of 54) and the first 8 columns (out of 1003) of my dataset

 stream n rates     means          1         2         3         4
 1   Brooks 3   3.0 0.9629152 0.42707006 1.9353659 1.4333884 1.8566225
 2  Siouxon 3   3.0 0.5831929 0.90503736 0.2838483 0.2838483 1.0023212
 3 Speelyai 3   3.0 0.6199235 0.08554021 0.7359903 0.4841935 0.7359903
 4   Brooks 4   7.5 0.9722707 1.43338843 1.8566225 0.0000000 1.3242210
 5  Siouxon 4   7.5 0.5865031 0.50574543 0.5057454 0.2838483 0.4756304
 6 Speelyai 4   7.5 0.6118634 0.32252396 0.4343109 0.6653132 2.2294652
 7   Brooks 5  10.0 0.9637475 0.88984211 1.8566225 0.7741612 1.3242210
 8  Siouxon 5  10.0 0.5804420 0.47501800 0.7383634 0.5482181 0.6430847
 9 Speelyai 5  10.0 0.5959238 0.15079491 0.2615963 0.4738504 0.0000000

Here is a simple plot I have made using the values found in the means column for all rows with stream name Speelyai (18).

The means column is calculated by taking the mean for the entire row. Each column represents 1 simulation. So, the mean column is the mean of 1000 simulations. I would like to plot the actual simulation values on the plot as well. I think it would be informative to not only have the mean plotted (with a line) but also show the "raw" data (simulations) as points. I see that I can use the geom_point(), but am not sure how to get all the points for any row that has the stream name "Speelyai"

THANKS

As you can see, the scales are much different, which I would assume, given these points are results from simulations, or resampling the original data. But How could I overlay these points on my original image in a way that still preserves the visual content? In this image the line looks almost flat, but in my original image we can see that it fluctuates quite a bit, just on a small scale...

Nick Kennedy Nick Kennedy · Accepted Answer · 2015-08-06T15:19:16

I would suggest reformatting your data in a long format rather than wide. For example:

library("tidyr")
library("ggplot2")
my_data_tidy <- gather(my_data, column, value, -c(stream, n, rates, means))
ggplot(subset(my_data_tidy, stream == "Speelyai"), aes(rates, value)) +
  geom_point() +
  stat_summary(fun.y = "mean", geom = "line")

Note this will also recalculate the means from your data. If you wanted to use your existing means, you could do:

ggplot(subset(my_data_tidy, stream == "Speelyai"), aes(rates, value)) +
  geom_point() + geom_line(aes(rates, means), data = subset(my_data, stream == "Speelyai"))

Adding points to plot using ggplot2

2 Answers