I would like to summarize my "karyotype" molecular data by location and substrate (see sample data below) as percentages in order to create a stack-bar plot in ggplot2.
I have figured out how to use 'dcast' to get a total for each karyotype, but cannot figure out how to get a percent for each of the three karyotypes (i.e. 'BB', 'BD', 'DD').
The data should be in a format to make a stacked bar plot in 'ggplot2'.
Sample Data:
library(reshape2)
Karotype.Data <- structure(list(Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle", "Steninge"
), class = "factor"), Substrate = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
2L, 2L, 2L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle",
"Steninge"), class = "factor"), Karyotype = structure(c(1L, 3L,
4L, 4L, 3L, 3L, 4L, 4L, 4L, 3L, 1L, 4L, 3L, 4L, 4L, 3L, 1L, 4L,
3L, 3L, 4L, 3L, 4L, 3L, 3L), .Label = c("", "BB", "BD", "DD"), class = "factor")), .Names = c("Location",
"Substrate", "Karyotype"), row.names = c(135L, 136L, 137L, 138L,
139L, 165L, 166L, 167L, 168L, 169L, 236L, 237L, 238L, 239L, 240L,
326L, 327L, 328L, 329L, 330L, 426L, 427L, 428L, 429L, 430L), class = "data.frame")
## Summary count for each karoytype ##
Karyotype.Summary <- dcast(Karotype.Data , Location + Substrate ~ Karyotype, value.var="Karyotype", length)
Karyotype.Summary[,3:5] <- Karyotype.Summary[,3:5]/rowSums(Karyotype.Summary[,3:5])*100
– Marat Talipov