4
votes

I created a bar chart using geom_bar with "Group" on the x-axis (Female, Male), and "Values" on the y-axis. Group is further subdivided into "Session" such that there is "Session 1" and "Session 2" for both Male and Female (i.e. four bars in total).

Since all participants participated in Session 1 and 2, I overlayed a dotplot (geom_dot) over each of the four bars, to represent the individual data.

I am now trying to connect the observations for all participants ("PID"), between session 1 and 2. In other words, there should be lines connecting several sets of two-points on the "Male" portion of the x-axis (i.e. per participant), and "Female portion".

I tried this with "geom_line" (below) but to no avail (instead, it created a single vertical line in the middle of "Male" and another in the middle of "Female"). I'm not too sure how to fix this.

See code below:

ggplot(data_foo, aes(x=factor(Group),y=Values, colour = factor(Session), fill = factor(Session))) + 
          geom_bar(stat = "summary", fun.y = "mean", position = "dodge") + 
          geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 1.0, position = "dodge", fill = "black") +
          geom_line(aes(group = PID), colour="dark grey") +
          labs(title='My Data',x='Group',y='Values') +
          theme_light() 

Sample data (.txt)

data_foo <- readr::read_csv("PID,Group,Session,Values
P1,F,1,14
P2,F,1,13
P3,F,1,16
P4,M,1,18
P5,F,1,20
P6,M,1,27
P7,M,1,19
P8,M,1,11
P9,F,1,28
P10,F,1,20
P11,F,1,24
P12,M,1,10
P1,F,2,26
P2,F,2,21
P3,F,2,19
P4,M,2,13
P5,F,2,26
P6,M,2,15
P7,M,2,23
P8,M,2,23
P9,F,2,30
P10,F,2,21
P11,F,2,11
P12,M,2,19")
1
welcome to SO. Could you kindly either create sample data or use dput(head(data,20)) and post the output? You can also use one of the many inbuilt data sets in Rtjebo
I am also not sure if you really want geom_dotplot? Maybe geom_point instead?tjebo
Thanks Tjebo - I am adding sample data right now!Grace
P.S. I noticed that adding this line "geom_line(aes(x=factor(Session), group = PID), colour="dark grey") +" will make my lines sepearte from my bar_graph (in case this might be a lead?)Grace
P.s. to my answer - this is bascially a nice exercise to learn ggplot, but I generally think this type of visualisation is rather confusing. In my own example from my very first question, I actually completely changed the way I showed the data - maybe try to plot value session 1 on the x axis and session 2 on the y axis, just with geom_point, and fill/color by Group. You will have all the information which you want to show in a much more understandable way (and easier to plot)tjebo

1 Answers

3
votes

The trouble you have is that you want to dodge by several groups. Your geom_line does not know how to split the Group variable by session. Here are two ways to address this problem. Method 1 is probably the most "ggploty way", and a neat way of adding another grouping without making the visualisation too overcrowded. for method 2 you need to change your x variable

1) Use facet

2) Use interaction to split session for each Group. Define levels for the right bar order

I have also used geom_point instead, because geom_dot is more a specific type of histogram. I would generally recommend to use boxplots for such plots of values like that, because bars are more appropriate for specific measures such as counts.

Method 1: Facets

library(ggplot2)
ggplot(data_foo, aes(x = Session, y = Values, fill = as.character(Session))) +
  geom_bar(stat = "summary", fun.y = "mean", position = "dodge") + 
  geom_line(aes(group = PID)) +
  geom_point(aes(group = PID), shape = 21, color = 'black') +
  facet_wrap(~Group)

Created on 2020-01-20 by the reprex package (v0.3.0)

Method 2: create an interaction term in your x variable. note that you need to order the factor levels manually.

data_foo <- data_foo %>% mutate(new_x = factor(interaction(Group,Session), levels = c('F.1','F.2','M.1','M.2')))

ggplot(data_foo, aes(x = new_x, y = Values, fill = as.character(Session))) + 
  geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
  geom_line(aes(group = PID)) +
  geom_point(aes(group = PID), shape = 21, color = 'black') 

Created on 2020-01-20 by the reprex package (v0.3.0)

But everything gets visually not very compelling.