Following is a sample data frame
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
My question started off as seemingly simple, but I could not find a way to edit the dataframe suitably to plot a barplot.
For Var1, I want to plot a stacked barplot of the percent of times var1 was present in the sample (i.e var1 value > 0) or absent (Similarly for var2 and so on).
I could determine this percentage by:
(1 - sum(df$Var1 == 0) / length(df$Var1)) * 100
But how do I convert this into a percentage while plotting? I looked at many melt options, but there is no unifying criteria for these variables that would make a common X axis
Finally, how does one answer the question above if I want to plot 5 variables from a dataframe of 1000 such column variables?
Edit: Thanks for the answers so far! I have a slight edit to the question I just added one more variable to my data frame
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
I am trying to figure out how to plot the barplot for cases and controls with presence absence stacked within them for Var1PA, Var2PA and so on. If I have the right data frame input, the ggplot2 code would be :
vars <- c('Var1PA', 'Var2PA', 'Var2PA')
##based on the first comment by @rawr
tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100)
ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "stack", stat="identity") + facet_grid(~vars)
How do I get percentages for cases (present and absent) and controls (present and absent) for each of the vars? Thanks!
vars <- c('Var1PA', 'Var2PA', 'Var2PA'); tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100); ggplot(tt, aes(Var2, Freq, fill = Var1)) + geom_bar(stat = 'identity')
– rawrlibrary(tidyverse) ; df %>% gather(var, pa, ends_with('PA')) %>% group_by(var) %>% do(pa = names(table(.$pa)), pct = prop.table(table(.$pa)) * 100) %>% unnest() %>% ggplot(aes(var, pct, fill = pa)) + geom_bar(stat = 'identity')
– alistaire