0
votes

I have many graphics with two times series plotted on them.

That is to say, I have one plot of y_1 and y_2 against a common set of dates.

For each plot, I would like to present the correlation on the plot between each pair of series. That is to say I would like to compute: cor(y_1,y_2) and include the resulting number on each plot.

This is surprisingly difficult to do in a principled way in ggplot2. I've found no simple way to do it using stat_cor so far.

I have already looked at other functions recommended for this task, but they are all designed for reporting the correlation of y_1 and y_2 in situations in which y_1 is plot against y_2 rather than both y_1 and y_2 are plot against time.

I would prefer a ggplot2-ish way to do this but I'm open to using any graphics software within R. Here is code for a minimal working example and what I have tried.

library(reprex); library(ggplot2); library(ggpubr)
n <- 6; 
Q=sample(18:30, n, replace=TRUE)

# make sample data
dat <- data.frame(id=1:n, 
                  date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
                  group=rep(LETTERS[1:2], n/2),
                  quantity= Q,
                  price= 100 - 2*Q + rnorm(n))
dat
#>   id       date group quantity    price
#> 1  1 2020-12-26     A       19 63.02628
#> 2  2 2020-12-27     B       26 49.66597
#> 3  3 2020-12-28     A       27 44.98031
#> 4  4 2020-12-29     B       24 51.11224
#> 5  5 2020-12-30     A       29 41.11129
#> 6  6 2020-12-31     B       28 43.04494

tseriesplot <- ggplot(dat, aes(x = date)) + ggtitle("Oil: Daily Quantity and Price") +
                  geom_line(aes(y = Q, color = "Quantity (thousands of barrels)")) +
                  geom_line(aes(y = price, color = "Price"))
  
tseriesplot


# naive attempt fails
tseriesplot + stat_cor(data = dat, aes(x=quantity, y=price),method="pearson")
#> Error: Invalid input: date_trans works with objects of class Date only

Created on 2021-01-05 by the reprex package (v0.3.0)

I thought this would be a good question because it is similar to more complex questions elsewhere, e.g. https://stat.ethz.ch/pipermail/r-help/2020-July/467805.html but much more basic.

2

2 Answers

2
votes

1) annotate Create the text txt you want to plot and then use annotate:

txt <- with(dat, sprintf("cor: %.2f", cor(quantity, price)))
tseriesplot + 
  annotate("text", label = txt, x = min(dat$date), y = max(dat$quantity, dat$price), 
    hjust = -0.1)

screenshot

2) grid.text Another approach is to use grid graphics which allows one to specify the location independently of the data. Using txt from above:

library(grid)

tseriesplot
grid.text(txt, 0.1, 0.9)

3a) zoo This would also work:

library(zoo)

z <- read.zoo(dat[c("date", "price", "quantity")])
txt <- sprintf("cor: %.2f", cor(z)[2])
autoplot(z, facet = NULL) +
  annotate("text", label = txt, x = start(z), y = max(z), hjust = -0.1)

3b) scale

or you could scale the variables as that does not affect the correlation:

z <- scale(z)
autoplot(z, facet = NULL) +
  annotate("text", label = txt, x = start(z), y = max(z), hjust = -0.1)

Discussion

Overall putting together parts of different solutions this seems the most compact

library(zoo)
library(grid)

z <- read.zoo(dat[c("date", "price", "quantity")])
autoplot(z, facet = NULL)
grid.text(sprintf("cor: %.2f", cor(z)[2]), 0.1, 0.9)
1
votes

Instead of trying to figure out how to do this with ggpubr::stat_cor you could simply compute the correlation coefficient and add it as an annotation to your plot using e.g. annotate:

library(ggplot2)
library(ggpubr)

set.seed(42)

n <- 6; 
Q=sample(18:30, n, replace=TRUE)

# make sample data
dat <- data.frame(id=1:n, 
                  date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
                  group=rep(LETTERS[1:2], n/2),
                  quantity= Q,
                  price= 100 - 2*Q + rnorm(n))
dat
#>   id       date group quantity    price
#> 1  1 2020-12-26     A       18 64.63286
#> 2  2 2020-12-27     B       22 56.40427
#> 3  3 2020-12-28     A       18 63.89388
#> 4  4 2020-12-29     B       26 49.51152
#> 5  5 2020-12-30     A       27 45.90534
#> 6  6 2020-12-31     B       21 60.01842

tseriesplot <- ggplot(dat, aes(x = date)) + ggtitle("Oil: Daily Quantity and Price") +
  geom_line(aes(y = quantity, color = "Quantity (thousands of barrels)")) +
  geom_line(aes(y = price, color = "Price"))

tseriesplot +
  annotate("text", 
           x = min(dat$date), 
           y = 70, 
           label = paste0("p = ", scales::number(cor(dat$quantity, dat$price, method = "pearson"), accuracy = .01)),
           hjust = 0)