3
votes

I'm trying to plot a data set containing over 2000 samples in a stacked bar chart format, with each sample (represented by "SampleID") on the x-axis and 6 measurement values on the y-axis (Measurement1-6). I want the samples to be displayed/ordered by a the following order of measurement variables: Measurement4, 1, 5, 2, 3, and 6, and from highest to lowest measurement value. Below is a subset of 15 samples as an example of what I'm working with, which I'll refer to as the "dummy_set" data frame:

    SampleID Measurement1 Measurement2 Measurement3 Measurement4 Measurement5 Measurement6
1         A         0.05         0.00         0.95         0.00          0.0         0.00
2         B         0.00         0.00         0.43         0.56          0.0         0.01
3         C         0.64         0.36         0.00         0.00          0.0         0.00
4         D         0.00         0.82         0.18         0.00          0.0         0.00
5         E         0.00         0.60         0.00         0.40          0.0         0.00
6         F         0.80         0.00         0.00         0.20          0.0         0.00
7         G         0.00         0.00         0.00         1.00          0.0         0.00
8         H         0.00         0.00         0.00         1.00          0.0         0.00
9         I         0.00         0.00         1.00         0.00          0.0         0.00
10        J         0.00         0.00         1.00         0.00          0.0         0.00
11        K         0.25         0.00         0.00         0.45          0.3         0.00
12        L         0.10         0.00         0.00         0.10          0.8         0.00
13        M         0.19         0.10         0.00         0.70          0.0         0.01
14        N         0.90         0.00         0.00         0.10          0.0         0.00
15        O         0.00         0.10         0.40         0.00          0.5         0.00

Here's the basics of what I've done:

  1. Melt the data set: melt_dummy_set <- melt(dummy_set, id.var = "SampleID")

    Where the melted data set looks like this:

    head(melt_dummy_set)
        SampleID     variable value
    1         A Measurement1  0.05
    2         B Measurement1  0.00
    3         C Measurement1  0.64
    4         D Measurement1  0.00
    5         E Measurement1  0.00
    6         F Measurement1  0.80
    
  2. Plot the melted data set using ggplot() and geom_bar():

    ggplot(melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
    geom_bar(stat = "identity") + 
    

Original stacked bar chart

As you can see, the samples are plotted in the original order that they were listed (A-O). However, I want them to be plotted in the following order: G, H, M, B, K, N, F, C, L, O, D, E, I, J, and A.

Based on other similar Stack Overflow questions, I've gathered that I need to relevel/re-establish the factors in the order I want. Here is what I've tried so far:

#Attempt 1
reordered_melt_dummy_set <- transform(melt_dummy_set, variable = reorder(variable, -value))
ggplot(reordered_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 1 results

#Attempt 2
copy_melt_dummy_set <- melt_dummy_set
copy_melt_dummy_set$variable <- factor(copy_melt_dummy_set$variable, levels = c("Measurement4", "Measurement5", "Measurement1", "Measurement2", "Measurement3", "Measurement6"))
ggplot(copy_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 2 results

My 3rd attempt resulted in multiple errors (denoted in "##" immediately after the line of code)

#Attempt 3
copy2_melt_dummy_set <- melt_dummy_set

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$value), "variable"])
##Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [2] is duplicated

copy2_melt_dummy_set$variable <- factor(copy2_melt_dummy_set$variable, levels = copy2_melt_dummy_set[order(copy2_melt_dummy_set$value), "variable"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [2] is duplicated

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$variable), "SampleID"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [16] is duplicated
## In addition: Warning message: In Ops.factor(copy2_melt_dummy_set$variable) : ‘-’ not meaningful for factors

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$value), "SampleID"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [16] is duplicated

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$value), "value"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [2] is duplicated
#Attempt 4
copy3_melt_dummy_set <- melt_dummy_set[order(melt_dummy_set$variable, -melt_dummy_set$value), ]
head(copy3_melt_dummy_set)
ggplot(copy3_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 4 results

#Attempt 5
ggplot(melt_dummy_set[order(melt_dummy_set$variable, -melt_dummy_set$value), ], aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 5 results

#Attempt 6
new_melt_dummy_set <- within(melt_dummy_set, 
                             variable <- factor(variable, levels = names(sort(table(variable), decreasing = TRUE))))
ggplot(new_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 6 results

#Attempt 7
copy4_melt_dummy_set <- melt_dummy_set
custom_leveling <- unique(copy4_melt_dummy_set$variable)
copy4_melt_dummy_set$variable <- factor(copy4_melt_dummy_set$variable, level = custom_leveling)
ggplot(copy4_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 7 results

In all cases I can't get the actual samples on the x-axis to be reorganized. I feel there's probably a simple fix for this, but I can't figure out what I'm doing wrong. Any suggestions?

Edited

In response to the possible duplicate comment, I tried applying the codes/solutions from Order Bars in ggplot2 bar graph and they did not produce the plot in the desired order that I wanted. See below for the codes I tried:

#First solution
new_melt_dummy_set <- within(melt_dummy_set, 
                             variable <- factor(variable, levels = names(sort(table(variable), decreasing = TRUE))))
ggplot(new_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity")

#Second solution
ggplot(melt_dummy_set, aes(x = reorder(SampleID, variable, function(x)-length(x)), y = value, fill = variable)) + geom_bar(stat = "identity")
ggplot(melt_dummy_set, aes(x = reorder(variable, SampleID, function(x)-length(x)), y = value, fill = variable)) + geom_bar(stat = "identity")

#Third solution
ordered_measurements <- c("Measurement4", "Measurement1", "Measurement5", "Measurement2", "Measurement3", "Measurement6")
ggplot(melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 
  scale_x_discrete(limits = ordered_measurements)

#Fourth solution
ggplot(melt_dummy_set, aes(x = reorder(SampleID, -table(variable)[variable]), y = value, fill = variable)) + geom_bar(stat = "identity")

require(forcats)
ggplot(melt_dummy_set, aes(x = SampleID, fill = fct_infreq(variable), y = value)) + geom_bar(stat = "identity")
ggplot(melt_dummy_set, aes(x = fct_infreq(variable))) + geom_bar(stat = "identity")

#Fifth solution
library(tidyverse)
library(forcats)
melt_dummy_set %>%
  mutate(variable = fct_reorder(variable, value, .desc = TRUE)) %>%
  ggplot(aes(x = SampleID, y = value, fill = variable)) + geom_bar(stat = 'identity')

#Sixth solution
library(dplyr)
melt_dummy_set %>%
  group_by(variable) %>%                              
  summarize(counts = n()) %>%
  arrange(-counts) %>%                                
  mutate(SampleID = factor(SampleID, variable)) %>%   
  ggplot(aes(x = SampleID, y = value, fill = variable)) +                  
  geom_bar(stat = "identity")                         

melt_dummy_set %>%
  group_by(SampleID) %>%                              
  summarize(counts = n()) %>%
  arrange(-counts) %>%                                
  mutate(SampleID = factor(SampleID, value)) %>%   
  ggplot(aes(x = SampleID, y = value, fill = variable)) +                 
  geom_bar(stat = "identity")

#Seventh solution
new_meltedDummy_set <- transform(melt_dummy_set,
                       variable = ordered(variable, levels = names(sort(-table(variable)))))
ggplot(new_meltedDummy_set, aes(x = SampleID, y = value, fill = variable)) +
  geom_bar(stat = "identity")

1
Possible duplicate of Order Bars in ggplot2 bar graphPavoDive
I'm editing some of your code to keep it as small as possible, see: stackoverflow.com/questions/56747471/…PavoDive
Thank you for sharing the links, especially the how to ask questions page, it's really helpful! I tried applying the solutions from the stackoverflow.com/questions/5208679/… link, but they didn't work for me. I edited my post to show what I tried. Please let me know if there's anything else I can clarify.Natasha

1 Answers

1
votes

Is this what you where going for? I think you were close. Instead of turning the Measurement variable column into a factor you need to order the SampleID column based on the order of Measurement values. This is what happens in the line where sample_order is calculated:

library(tidyverse)

dummy_set <- tibble(
  SampleID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
  Measurement1 = c(0.05, 0, 0.64, 0, 0, 0.8, 0, 0, 0, 0, 0.25, 0.1, 0.19, 0.9, 0),
  Measurement2 = c(0, 0, 0.36, 0.82, 0.6, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0.1),
  Measurement3 = c(0.95, 0.43, 0, 0.18, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0.4),
  Measurement4 = c(0, 0.56, 0, 0, 0.4, 0.2, 1, 1, 0, 0, 0.45, 0.1, 0.7, 0.1, 0),
  Measurement5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.3, 0.8, 0, 0, 0.5),
  Measurement6 = c(0, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, 0, 0)
)

sample_order <- dummy_set %>%
  arrange(desc(Measurement4), desc(Measurement1), desc(Measurement5), desc(Measurement2), desc(Measurement3), desc(Measurement6)) %>%
  pull(SampleID)

melt_dummy_set <- dummy_set %>%
  gather(variable, value, -SampleID)

reordered_melt_dummy_set <- melt_dummy_set %>%
  mutate(SampleID = factor(SampleID, levels = sample_order))

plot_ordered <- ggplot(reordered_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(expand = c(0,0)) +
  theme(axis.ticks.x = element_blank(), panel.grid = element_blank(), axis.line = element_line(color = "black"), panel.border = element_blank(), panel.background = element_blank())

plot_ordered

Created on 2019-07-26 by the reprex package (v0.3.0)