ggplot dots based on sample size with customized range R

Question

I have test dataset that i want to graph:

 Week   M
50  0.082474227
50  0.100694444
50  0.079037801
50  0.090277778
50  0.083333333
50  0.097222222
50  0.09375
50  0.104166667
12  0.079861111
12  0.104166667
12  0.09375
12  0.090277778
80  0.079861111
80  0.128472222
80  0.052083333
80  0.09375
80  0.120274914
80  0.118055556
80  0.121527778
80  0.097222222
80  0.069444444
80  0.145833333
80  0.065972222
80  0.045138889
80  0.083333333
80  0.079861111
80  0.092783505
80  0.113402062
80  0.090277778
80  0.134020619
80  0.118055556

I want to graph the data based on the mean values of 'week12','week50' and 'week80' with error bars, and size the dots based on the sample sizes.

first i need to make a summary statistics of the dataset:

SEsum <- summarySE(data, measurevar="M", groupvars="Week")

next i want to plot the graph:

ggplot(SEsum, aes(x=Week, y=M)) + 
geom_errorbar(aes(ymin=M-se, ymax=M+se), width=3) +
geom_line() +
geom_point(aes(size= N))+
scale_x_continuous(breaks=c(12,50,80), labels=c("Wk12", "Wk50", "Wk80"))

the plot looks like this:

everything looks good except that i would like to customize the range of the sample sizes it uses to graph the dot sizes.

In the graph that's where the legend says N is set to '4', '8', '12' and '16'. In the code that would be the part where it says 'geom_point(aes(size= N))', i want the minimum sample size to be 1 and maximum to be 50, and if possible, use only 3 choices (the plot here gives 4 choices) because there are only 3 time points.

the reason for that is i need to graph 26 such graphs with 26 different data sets with different sample sizes, and i would like to standardize the range so when i put all of the graphs side by side, it will be easy to compare.

Mark Peterson Mark Peterson · Accepted Answer · 2016-12-12T19:47:21

I think that you are looking for scale_size to control the sizing of those points. Here, I set the limits to c(0,50) to standardize the range across plots, and set the breaks to match the actual values from this plot:

ggplot(SEsum, aes(x=Week, y=M)) + 
  geom_errorbar(aes(ymin=M-se, ymax=M+se), width=3) +
  geom_line() +
  geom_point(aes(size= N))+
  scale_x_continuous(breaks=c(12,50,80), labels=c("Wk12", "Wk50", "Wk80")) +
  scale_size(limits = c(0, 50)
             , breaks = unique(SEsum$N))

gives:

However, note that such size distinctions may prove misleading (humans aren't great at that kind of comparison). You may be better off labelling the sample sizes more explictly. This also reduces the amount of code to generate the plots, as you can use stat_summary instead, which defaults to what the mean +/- SE that you are plotting here.

First, generate the labels for each week. Here, I am using dplyr (which will conflict substantially with plyr, which was loaded automatically inside the function you were using).

forLabel <-
  data %>%
  group_by(Week) %>%
  summarise(Count = n()) %>%
  mutate(Label = paste0("Wk ", Week, "\n(n = ", Count, ")"))

Returns:

   Week Count           Label
  <int> <int>           <chr>
1    12     4  Wk 12\n(n = 4)
2    50     8  Wk 50\n(n = 8)
3    80    19 Wk 80\n(n = 19)

Then, we can use that to label the axis instead of the hard coding you were doing before:

ggplot(data
       , aes(x=Week, y=M)) +
  stat_summary() +
  scale_x_continuous(breaks = forLabel$Week
                     , labels = forLabel$Label)

Gives:

ggplot dots based on sample size with customized range R

1 Answers