1
votes

I have two categorical variables that I want to compare through cross-tabulation. I made a dummy example of an extended square contingency table where the categories of X are in the table’s rows, while the same sequence of categories for Y are in the table’s columns. The table summarizes the association between X and Y.

The table’s diagonal entries give the number of observations for which the X category matches the Y category, in which case the observations are Hits for the category. Each off-diagonal entry is a False Alarm for the X category and a Miss for the Y category.

extended square contingency table

Graph1

I want to make a stacked barplot like graph1 that shows each row of the table with the filled colors coming from the columns (Y variable). And the last two bars showing the misses and false alarms for each category.

I managed to make two separate graphs. The code below generates the first four rows of graph1.

# Create Dummy Input
sample.mtxx <- matrix(c(1,0,2,0,0,3,0,3,2,3,3,0,0,0,0,3), nrow = 4)
categories <- c("A","B","C","D")
colnames(sample.mtx) <- paste(categories)
rownames(sample.mtx) <- paste(categories)

# Change from wide to long format
g1.df <- melt(sample.mtx)
# Zero sizes were causing problem so I removed them.
g1.df <- g1.df[g1.df$value!=0,]

# Add a label column to show "Hit".
g1.df$label <- ifelse(g1.df$Var1==g1.df$Var2, "Hit", as.character(""))

# Plotting
plot1 <- ggplot(data=g1.df, mapping=aes(fill=Var2, y=value, x=Var1, label = label))+
  geom_bar(width = 0.6, position="stack", stat="identity")+
  labs(x="Table Feature", y="Entry size as the number of observations", title="Entry Size") +
  geom_text(size = 4, position = position_stack(vjust = 0.5))+
  coord_flip()+
  theme_bw()+
  scale_x_discrete(limit = c("D", "C", "B", "A"))+
  scale_y_discrete(limits=seq(0,10,1))+
  theme(plot.title = element_text(family = "Times", color = "#353535", 
                                  face = "bold", size = 12, hjust = 0.5))+
  theme(legend.position = "bottom", legend.title = element_blank())+
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank()
  ) 

Plot1 shows the result. The problem is the order of components in the stacked bars are not right. I 'll be grateful if someone can explain How can I arrange the components of the bar?

To plot Misses and False Alarms this is what I coded:

# Hits, False Alarm, and Miss
hits <- diag(sample.mtx)
false.alarms <- rowSums(sample.mtx) - hits
misses <- colSums(sample.mtx) - hits

# Make a data frame
g1.df1 <- as.data.frame(cbind(categories, misses, false.alarms))

# Change it to long format and get rid of zero sizes.
g1.df1.m <- melt(g1.df1, id.vars="categories")
g1.df1.m <- g1.df1.m[g1.df1.m$value!=0,]

# Plotting
plot2 <- ggplot(data=g1.df1.m, inherit.aes = FALSE, mapping=aes(fill=categories, y=value, x=variable))+
  geom_bar(width = 0.6, position="stack", stat="identity")+
  coord_flip()+
  theme_bw()+
  theme(legend.position = "none")+
  scale_x_discrete(limit = c("misses", "false.alarms"))+
  scale_y_discrete(limits=seq(0,10,1))

plot2 I am happy with this plot. But what I want is to have both plot1 and plot 2 in one plot like I shown in graph1. Can anyone please provide guidance on how to draw stacked-bar plots from different data frames. Or is there a better way to make graph1.

1
I don't want to discourage you, but is Graph1 definitely what you want? I'm not super well-acquainted with classification literature, but I find Graph1 pretty hard to read. Maybe a better option would be balloonplot, e.g.: sthda.com/english/articles/32-r-graphics-essentials/…?Adam B.

1 Answers

0
votes

"How can I arrange the components of the bar?"

The trick is to use levels attribute of a column of type (`categories here, for example). You need the ordered to be reversed.

I want is to have both plot1 and plot 2 in one plot like I shown in graph1.

If you want to just reproduce the Graph1 with your code, this works:

#-------------------
#Data wrangling
colnames(g1.df)[1] <-categories; colnames(g1.df)[2] <- variable; #change the names similar to 2nd df
g1.df1.m[,3] <- as.numeric(g1.df1.m[,3]);# changing column type from character to numeric as the correpsonding column in `g1.df` is numeric.
g1.dfCombined <- g1.df %>% bind_rows(g1.df1.m); #merging two dfs.
#this is the part that reverses the order:
g1.dfCombined$categories <- factor(g1.dfCombined$categories, rev(levels(g1.dfCombined$categories)))

#-------------------
#Plotting: (all same except dropped `scale_x_discrete(limit = c("D", "C", "B", "A"))`)
ggplot(data=g1.dfCombined, mapping=aes(fill=categories, y=value, x=variable, label = label))+
 geom_bar(width = 0.6, position="stack", stat="identity")+
 labs(x="Table Feature", y="Entry size as the number of observations", title="Entry Size") +
 geom_text(size = 4, position = position_stack(vjust = 0.5)) +
 coord_flip()+theme_bw() + scale_y_discrete(limits=seq(0,10,1)) +
 theme(plot.title = element_text(family = "Times", color = "#353535", face = "bold", 
 size = 12, hjust = 0.5)) +
theme(legend.position = "bottom", legend.title = element_blank()) +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())

enter image description here

NOTE: It's probably a good idea to use different values (e.g. use 1,2,3, ... in stead of A,B,C as the later is used twice with two different variables categories & variable, which can create confusion.)