0
votes

Using ggplot2, I can plot a boxplot superimposed with points. But the points are located on a vertical line.

library(ggplot2)

example_data <- data.frame(cohort = c("ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC"), 
                           sample = c("A5LI", "A5JQ", "A5JP", "A5LE", "A5LG", "A5JV", "A5JD", "A5J8", "A5K8", "A5L3", "AA33", "AA30", "AA2T", "A95A", "AAZT", "A8I3", "AAV9", "A8Y4", "A8Y8", "AA31", "AAAT", "A9U4", "A7Q1", "A7DS", "A9TV", "A4D5", "A9TY", "A7CX", "A9TW", "A86F"), 
                           count = c(50, 5, 65, 22, 18, 25, 27, 86, 24, 20, 48, 96, 60, 27, 81, 34, 43, 58, 31, 77, 160, 31, 157, 104, 84, 53, 153, 111, 278, 105))


ggplot(example_data, aes(cohort, count)) + 
  geom_boxplot(aes(color = cohort)) + 
  geom_point(aes(color = cohort)) +
  scale_y_log10() +
  labs(x = NULL) +
  theme(axis.line.x = element_blank(), axis.ticks.x = element_blank(), 
        axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5), legend.position = 'none')

How could I reorder the points according their y values ("count" size in example_data) like this plot?

plot

1
Hi OP, can you please provide an example dataset we could use and help to define your explicit question? What part of creating the plot you show is giving you the most trouble? What have you tried so far to get that to work?chemdork123
Thanks, I've modified my question more clearly.廖健龙

1 Answers

1
votes

If you look at the example plot you showed of your desired output and consider the scales, there are basically two different layers:

  1. Overall: The x axis as some category ("DKFZ", "Sanger", "SMuFin"...) and the y axis being some value used for the boxplot.

  2. Within each boxplot: the x axis is some other continuous value and the y axis being the same value used as the y axis in the boxplot.

This means that the x axis for each boxplot is different than the x axis used for the plot as a whole. You kind of want a "secondary x axis". All comments on if this is a good idea aside, I can show you the approach for how one might do this in ggplot2.

A secondary x axis is not a built-in feature with ggplot2; however, since one of your desired axes is categorical/discrete (example_data$cohort) and the other axis is continuous (example_data$count), we can simulate this effect of two x axes with some clever formatting of facets.

The general idea is that we separate your plot into facets based on cohort, then within each plot we show a boxplot for the whole (grouped by cohort) and plot points on each facet. This means our x axis value is count as well as the y axis value - I assume that in your real data the axis values would not be the same, but it works for example purposes. Then, we can use some theme elements and options regarding the facet labels (referred to as strip.text elements in ggplot2) to simulate the same look. I'm also switching to use the theme_classic() by default, since otherwise you have to deal with the x gridlines that won't make sense in the final plot. If you want the vertical lines, you'll have to place them manually or programmatically based on your data.

Normally, facets are spaced apart, but I'm pushing them together via panel.spacing.x.

It's useful to compare the plots side-by-side, so note that I'm using cowplot::plot_grid() to arrange the old and new plots for demonstration purposes here.

One very important note is that I'm adding outlier.shape = NA to the call for geom_boxplot(). This is important because by default any outliers will be shown via the geom_boxplot() command as points, and they would be in the "incorrect" x position. Since we're already handling the desired position for all these points, it's necessary to remove them like this.

p <- # your code you shared + labs(title="Old Plot")

p1 <- 
ggplot(example_data, aes(count, count)) +
  geom_boxplot(aes(color=cohort), outlier.shape = NA) +
  geom_point(aes(color=cohort)) +
  facet_wrap(~cohort, scales='free_x', strip.position = 'bottom') +
  scale_y_log10() +
  labs(title='New Plot', x=NULL) +
  theme_classic() +
  theme(
    panel.spacing.x = unit(0,'pt'),
    axis.text.x = element_blank(),
    strip.placement = 'outside',
    strip.background = element_blank(),
    axis.ticks.x = element_blank()
  )

library(cowplot)
plot_grid(p, p1)

enter image description here