3
votes

I want to make a stacked barplot with the data unchanged. I mean, I have already calculated the percentages to plot. According to the ggplot2 manual "geom_col uses stat_identity: it leaves the data as is". However, looks like it isn't working as the percents of the plot are different from that of the sample data.

Download sample data from here.

Code is as follows:

ggplot(data=df, aes(x = Pathway, y = value, fill = variable)) +
        scale_fill_manual(values=c("#005588", "#E69F00")) +                                                             
        #stat_identity(geom="bar", width=0.5) +                                                                                                                    
        geom_col(width=0.5) +
        #geom_bar(stat="identity", width=0.5) +
        facet_grid(. ~ Timepoint) +
        coord_flip() +
        theme_bw()

geom_col changes data and rows order

On the other side, If I use the option "stat_identity" the data remains unchanged (compare percents from both images with sample data), but the bar plots are not stacked any more.

stat_identity do not touch the data but looses stacked bars.

Is the "geom_col" option not working or am I doing something wrong? Should I use another plot method? Any help is appreciated.

dput:

structure(list(Pathway = c("Antigen Presentation Pathway", "Graft-versus-  Host Disease Signaling", 
"T Helper Cell Differentiation", "Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells", 
"Communication between Innate and Adaptive Immune Cells", "Antigen Presentation Pathway", 
"Graft-versus-Host Disease Signaling", "T Helper Cell Differentiation", 
"Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells", 
"Communication between Innate and Adaptive Immune Cells", "Antigen Presentation Pathway", 
"Graft-versus-Host Disease Signaling", "T Helper Cell Differentiation", 
"Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells", 
 "Communication between Innate and Adaptive Immune Cells", "Antigen Presentation Pathway", 
"Graft-versus-Host Disease Signaling", "T Helper Cell Differentiation", 
"Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells", 
"Communication between Innate and Adaptive Immune Cells", "Antigen Presentation Pathway", 
"Graft-versus-Host Disease Signaling", "T Helper Cell Differentiation", 
"Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells", 
"Communication between Innate and Adaptive Immune Cells", "Antigen Presentation Pathway", 
"Graft-versus-Host Disease Signaling", "T Helper Cell Differentiation", 
"Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells", 
"Communication between Innate and Adaptive Immune Cells"), Timepoint = c("15DPI", 
"15DPI", "15DPI", "15DPI", "15DPI", "30DPI", "30DPI", "30DPI", 
"30DPI", "30DPI", "45DPI", "45DPI", "45DPI", "45DPI", "45DPI", 
"15DPI", "15DPI", "15DPI", "15DPI", "15DPI", "30DPI", "30DPI", 
"30DPI", "30DPI", "30DPI", "45DPI", "45DPI", "45DPI", "45DPI", 
"45DPI"), variable = c("Targets", "Targets", "Targets", "Targets", 
"Targets", "Targets", "Targets", "Targets", "Targets", "Targets", 
"Targets", "Targets", "Targets", "Targets", "Targets", "DEGs", 
"DEGs", "DEGs", "DEGs", "DEGs", "DEGs", "DEGs", "DEGs", "DEGs", 
"DEGs", "DEGs", "DEGs", "DEGs", "DEGs", "DEGs"), value = c(2.63157894736842, 
4.16666666666667, 1.36986301369863, 3.125, 1.12359550561798, 
7.89473684210526, 18.75, 8.21917808219178, 18.75, 7.86516853932584, 
15.7894736842105, 16.6666666666667, 10.958904109589, 9.375, 8.98876404494382, 
44.7368421052632, 35.4166666666667, 43.8356164383562, 37.5, 31.4606741573034, 
47.3684210526316, 43.75, 42.4657534246575, 37.5, 33.7078651685393, 
52.6315789473684, 39.5833333333333, 39.7260273972603, 31.25, 31.4606741573034)), .Names = c("Pathway", "Timepoint", "variable", 
"value"), class = "data.frame", row.names = c(NA, -30L))
1
What is your expected output? the code you posted appears to be working for meMike H.
Stacking bars is different than "leaving things as they are". You want the default position = 'stack', not the custom 'position 'identity'` you are using.Gregor Thomas
@Mike H I need the original percents from the sample data, like in the second plot, but stacked like the first plot.fred
I don't want to download any files and read stuff in. Can you share a small sample of data that illustrates the problem? The data you dput isn't helpful because it has only one value of variable, only one row in the second Timepoint, and all unique values of Pathway, so there's nothing to stack. 8 rows (2 timepoints * 2 pathways * 2 variables) should be sufficient.Gregor Thomas
@fred maybe it's just me, but is that not what the first plot is doing? The second plot is showing the DEGs only (which cover the Targets). The second looks to be stacking the percents from the DEGs and the TargetsMike H.

1 Answers

4
votes

Given the discussion by you and Gregor in the comments above, it sounds like you do not want the plots stacked on each other, but rather overlaid. I believe this should work for you:

ggplot(data=df, aes(x = Pathway, y = value, fill = variable)) +
  scale_fill_manual(values=c("#005588", "#E69F00")) +                                                             
  geom_col(width = 0.5, alpha = 0.5, position = "identity") +
  facet_grid(. ~ Timepoint) +
  coord_flip() +
  theme_bw()

enter image description here

I use position = "identity" to make sure the bars don't stack. I also had to make the bars transparent with alpha = 0.5 so you can see them.


Another option if you want to have them plotted side by side instead of stacking is to use position = "dodge":

ggplot(data=df, aes(x = Pathway, y = value, fill = variable)) +
  scale_fill_manual(values=c("#005588", "#E69F00")) +                                                             
  geom_col(width=0.5, position = "dodge") +
  facet_grid(. ~ Timepoint) +
  coord_flip() +
  theme_bw()

enter image description here