I have two categorical variables that I want to compare through cross-tabulation. I made a dummy example of an extended square contingency table where the categories of X are in the table’s rows, while the same sequence of categories for Y are in the table’s columns. The table summarizes the association between X and Y.
The table’s diagonal entries give the number of observations for which the X category matches the Y category, in which case the observations are Hits for the category. Each off-diagonal entry is a False Alarm for the X category and a Miss for the Y category.
extended square contingency table
I want to make a stacked barplot like graph1 that shows each row of the table with the filled colors coming from the columns (Y variable). And the last two bars showing the misses and false alarms for each category.
I managed to make two separate graphs. The code below generates the first four rows of graph1.
# Create Dummy Input
sample.mtxx <- matrix(c(1,0,2,0,0,3,0,3,2,3,3,0,0,0,0,3), nrow = 4)
categories <- c("A","B","C","D")
colnames(sample.mtx) <- paste(categories)
rownames(sample.mtx) <- paste(categories)
# Change from wide to long format
g1.df <- melt(sample.mtx)
# Zero sizes were causing problem so I removed them.
g1.df <- g1.df[g1.df$value!=0,]
# Add a label column to show "Hit".
g1.df$label <- ifelse(g1.df$Var1==g1.df$Var2, "Hit", as.character(""))
# Plotting
plot1 <- ggplot(data=g1.df, mapping=aes(fill=Var2, y=value, x=Var1, label = label))+
geom_bar(width = 0.6, position="stack", stat="identity")+
labs(x="Table Feature", y="Entry size as the number of observations", title="Entry Size") +
geom_text(size = 4, position = position_stack(vjust = 0.5))+
coord_flip()+
theme_bw()+
scale_x_discrete(limit = c("D", "C", "B", "A"))+
scale_y_discrete(limits=seq(0,10,1))+
theme(plot.title = element_text(family = "Times", color = "#353535",
face = "bold", size = 12, hjust = 0.5))+
theme(legend.position = "bottom", legend.title = element_blank())+
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()
)
shows the result. The problem is the order of components in the stacked bars are not right. I 'll be grateful if someone can explain How can I arrange the components of the bar?
To plot Misses and False Alarms
this is what I coded:
# Hits, False Alarm, and Miss
hits <- diag(sample.mtx)
false.alarms <- rowSums(sample.mtx) - hits
misses <- colSums(sample.mtx) - hits
# Make a data frame
g1.df1 <- as.data.frame(cbind(categories, misses, false.alarms))
# Change it to long format and get rid of zero sizes.
g1.df1.m <- melt(g1.df1, id.vars="categories")
g1.df1.m <- g1.df1.m[g1.df1.m$value!=0,]
# Plotting
plot2 <- ggplot(data=g1.df1.m, inherit.aes = FALSE, mapping=aes(fill=categories, y=value, x=variable))+
geom_bar(width = 0.6, position="stack", stat="identity")+
coord_flip()+
theme_bw()+
theme(legend.position = "none")+
scale_x_discrete(limit = c("misses", "false.alarms"))+
scale_y_discrete(limits=seq(0,10,1))
I am happy with this plot. But what I want is to have both plot1 and plot 2 in one plot like I shown in graph1. Can anyone please provide guidance on how to draw stacked-bar plots from different data frames. Or is there a better way to make graph1.