I want to calculate the pair-wise correlations between "mpg" and all other numeric variables of interest for each cyl in the mtcars dataset. I would like to adopt the tidy data principle.
It's rather easy with corrr::correlate()
.
library(dplyr)
library(tidyr)
library(purrr)
library(corrr)
data(mtcars)
mtcars2 <- mtcars[,1:7] %>%
group_nest(cyl) %>%
mutate(cors = map(data, corrr::correlate),
stretch = map(cors, corrr::stretch)) %>%
unnest(stretch)
mtcars2 %>%
filter(x == "mpg")
By using corrr::correlate()
, all available pair-wise correlations have been calculated. I could use dplyr::filter()
to select the correlations of interest.
However, when datasets are large, a lot of calculations go to the unwanted correlations, making this approach very time-consuming. So I tried to calculate only mpg vs. others. I'm not very familiar with purrr, and the following code doesn't work.
mtcars2 <- mtcars[,1:7] %>%
group_nest(cyl) %>%
mutate(comp = map(data, ~colnames),
corr = map(comp, ~cor.test(data[["mpg"]], data[[.]])))