3
votes

I want to find correlations, p-values and 95% CI between one specific column and all other columns in a dataframe. The 'broom' package provides an example how to do that between two columns using cor.test with dplyr and pipes. For mtcars and, say, mpg column we can run a correlation with another column:

library(dplyr)
library(broom)
mtcars %>% do(tidy(cor.test(.$mpg, .$cyl)))

estimate statistic      p.value parameter   conf.low  conf.high
1 -0.852162 -8.919699 6.112687e-10        30 -0.9257694 -0.7163171

The output is a single-row dataframe. I'd like to run cor.test for mpg with each column and send the output to a separate row. When mpg column is paired with every other column, the desired output would look like this:

    estimate statistic      p.value parameter   conf.low     conf.high
cyl  -0.852162  -8.919699 6.112687e-10       30 -0.9257694 -0.7163171
disp -0.8475514 -8.747152 9.380327e-10       30 -0.9233594 -0.7081376
hp   -0.7761684 -6.742389 1.787835e-07       30 -0.8852686 -0.5860994
drat  0.6811719  5.096042 1.77624e-05        30 0.4360484  0.832201
wt   -0.8676594 -9.559044 1.293959e-10       30 -0.9338264 -0.7440872
qsec  0.418684   2.525213 0.01708199         30 0.08195487 0.6696186
vs    0.6640389  4.864385 3.415937e-05       30 0.410363 0.8223262
am    0.5998324  4.106127 0.0002850207       30 0.3175583  0.784452
gear  0.4802848  2.999191 0.005400948        30 0.1580618 0.7100628
carb -0.5509251  -3.61575 0.001084446        30 -0.754648 -0.2503183

Note the added row names in the first column. They show which column was paired with mpg for the cor.test. Ideally, I'd like to do this with dplyr and pipes.

1

1 Answers

5
votes

Here's a solution that sticks with the do approach. The step you're missing is to gather your data and then group by the variable.

library(dplyr)
library(tidyr)
library(broom)

mtcars %>%
  gather(var, value, -mpg) %>%
  group_by(var) %>%
  do(tidy(cor.test(.$mpg, .$value))) %>%
  ungroup() %>%
  mutate(var = factor(var, names(mtcars)[-1])) %>%
  arrange(var)

And here's an example that's more along the base R approach (though I used pipes for convenience, but it's easily adaptable)

library(dplyr)
library(broom)

xvar <- "mpg"
yvar <- names(mtcars)[!names(mtcars) %in% xvar]

lapply(yvar,
       function(yvar, xvar, DF)
       {
         cor.test(DF[[xvar]], DF[[yvar]]) %>%
           tidy()
       },
       xvar,
       mtcars) %>%
  bind_rows() %>%
  mutate(yvar = yvar) %>%
  select(yvar, estimate:conf.high)