1
votes

I am trying two make a double y-axis plot with ggplot2. However, the primary y-axis text values are changed (and limits) and one of the variables is wrong displayed ("mean" variable). Edit: The text labels for the "mean" variable are ranging from 0.55 until 0.75, making difficult to see the varibility. However, in the original step for that plot (p <- p + geom_line(aes(y = mean_d, colour = "mean")) + geom_point(aes(y = mean_d, colour = "mean"))) it was ranging from 0.7757 until 0.7744. It should be displayed as the original step (maybe it has to be with the manipulation of the data within the ggplot calls?) In addition, is it possible to coordinate the axis-y1 texts with the axis-y2 text to be displayed in the same horizontal line?

# dput(coeff.mean)
coeff.mean <- structure(list(individuals = c(5L, 18L, 31L, 43L, 56L, 69L, 82L, 
95L, 108L, 120L, 133L, 146L, 159L, 172L, 185L, 197L, 210L, 223L, 
236L, 249L, 262L, 274L, 287L, 300L, 313L, 326L, 339L, 351L, 364L, 
377L), mean_d = c(0.775414405190575, 0.774478867355839, 0.774632679560057, 
0.774612015422181, 0.774440717600404, 0.774503749029999, 0.774543337328481, 
0.774536584528457, 0.774518615875444, 0.774572944896752, 0.774553554507719, 
0.774526346948343, 0.774537645238366, 0.774549039219398, 0.774518593880137, 
0.77452848368359, 0.774502654364311, 0.774527249259969, 0.774551190425812, 
0.774524221826879, 0.774514765537317, 0.774541221078135, 0.774552621147008, 
0.774546365564095, 0.774540310535789, 0.774540468208943, 0.774548658706833, 
0.77454534219406, 0.774541081476004, 0.774541996470423), var_d = c(0.000438374265308954, 
0.000345714068446388, 0.000324909665783972, 0.000318897997146887, 
0.000316077108040133, 0.000314032075708385, 0.000310447758209298, 
0.000310325171003455, 0.000311927176741998, 0.000309622062319051, 
0.000308772480851544, 0.000308388263293765, 0.000306838067001956, 
0.000307838047303517, 0.000307737478217495, 0.000306351076037266, 
0.000307288393036824, 0.000306717640522594, 0.000306768886331324, 
0.000306897320278579, 0.000307154374510682, 0.000306352361061403, 
0.000306998606721366, 0.000306434828650204, 0.000305865398401208, 
0.000306061994682725, 0.000305934443005304, 0.000305853730364841, 
0.000306181262913308, 0.000306820996289535)), .Names = c("individuals", 
"mean_d", "var_d"), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

p <- ggplot(coeff.mean, aes(x=individuals))
p <- p + geom_line(aes(y = mean_d, colour = "mean")) + geom_point(aes(y = mean_d, colour = "mean"))
p <- p + geom_line(aes(y = var_d*(max(mean_d)/max(var_d)), colour = "var")) + geom_point(aes(y = var_d*(max(mean_d)/max(var_d)), colour = "var")) 
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*(max(coeff.mean$var_d)/max(coeff.mean$mean_d)), name = "var"))
p <- p + scale_colour_manual(values = c("black", "grey"))
p <- p + labs(y = "mean", x = "Resampled", colour = "Statistic")
print(p)

I do appreciate any advice.

enter image description here

2
I think you need to be more specific about what is wrong with the plot. What should the y-axis values be? Which variable is displayed wrong? How should it be displayed?bdemarest
@bdemarest I have edited the question. Thanks in advance.user1157485
One or both or your y-variables will need to be rescaled in order to show the full range of both mean and var on a single plot. But then the y-axis values shown will be the scaled values. So you would then have to manually set the y-axis breaks and values to show the original values. ggplot does not make these adjustments automatically, in part because the authors want to discourage dual-axis plots.bdemarest
@bdemarest Thanks for the comment. I will try to do what you say. To be honest, I do not like to use this kind of plot neither, however in this particular case (considering that I want to plot several parameters in this way) is better to have 5 plots instead of 10 plots.user1157485
Your problem is that your variance isn't multiplicatively on the wrong scale, it's additively. The range of each value is very narrow (+/- 0.0001), so multiplying your variance will dwarf the spread in your mean.Brian

2 Answers

1
votes

This more clearly shows what my comment was pointing out: You don't need to multiplicatively scale var_d, you need to add to it.

library(dplyr)

coeff.mean %>% 
  ggplot(aes(individuals, mean_d)) +
  geom_point(aes(color = "mean_d")) + geom_line(aes(color = "mean_d")) +
  geom_point(aes(individuals, var_d+0.7745, color = "var_d")) + 
  geom_line(aes(individuals, var_d+0.7745, color = "var_d")) +
  scale_y_continuous(sec.axis = sec_axis(trans = ~ . - 0.7745))

enter image description here

Of course, this figure is problematic for all sorts of reasons. It's hard to interpret for sure.

If you want to scale both multiplicatively and additively, you could try scales::rescale, once to scale var_d to the range of mean_d, and then again to scale the scaled var_d back to the original range.

coeff.mean %>% 
  mutate(var_rescaled = scales::rescale(var_d, to = range(mean_d))) %>% 
  ggplot(aes(individuals, mean_d)) +
  geom_point(aes(color = "mean_d")) + geom_line(aes(color = "mean_d")) +
  geom_point(aes(y = var_rescaled, color = "var_d")) + 
  geom_line(aes(y = var_rescaled, color = "var_d")) +
  scale_y_continuous(sec.axis = 
    sec_axis(trans = ~scales::rescale(., to = range(coeff.mean$var_d)),
             breaks = function(values) {scales::pretty_breaks(n=5)(values)},
             name = "var_d"))

enter image description here

This one has problems too. Particularly, since the highest value of both mean_d and var_d were at the same individual, they overlap at that point.

2
votes

Here I show using facets as an alternative to a dual-axis plot. I know it does not answer the original question, sorry!

library(ggplot2)
library(tidyr)

# Convert data to long form with tidyr::gather()
long_dat = gather(data=coeff.mean, key="stat", value="stat_value", mean_d, var_d)

head(long_dat)
# A tibble: 6 x 3
#   individuals   stat stat_value
#         <int>  <chr>      <dbl>
# 1           5 mean_d  0.7754144
# 2          18 mean_d  0.7744789
# 3          31 mean_d  0.7746327
# 4          43 mean_d  0.7746120
# 5          56 mean_d  0.7744407
# 6          69 mean_d  0.7745037

p2 = ggplot(long_dat, aes(x=individuals, y=stat_value, colour=stat)) + 
     geom_point() + 
     geom_line() + 
     scale_colour_manual(values=c(mean_d="black", var_d="grey40")) +
     facet_grid(stat ~ ., scales="free_y")

ggsave("faceted_plot.png", plot=p2, height=4, width=6, dpi=150)

enter image description here