0
votes

I'm new to R and I'm trying to create a correlation matrix that will also include p-values.

The main issue I'm having is with computing correlations for specific numeric variables depending on the identity of three factors.

My data looks something like this

    data.frame(
      cond = c("low", "medium", "high"),
      group = c("gr1", "gr2", "gr3"),
      rand = c("yes", "no"),
      trial1 = rnorm(30),
      trial2 = rnorm(30))

I want to correlate trial1 and trial2 for each unique value in cond, group, and rand. Essentially, for each level of those factors, I would like to get an r- and p-value, and save them in a matrix.

I tried it the long way - extracting the observations that I want to correlate by using three logical tests like this(df$cond == "low") & (df$group == 'gr1') & (df&rand == 'yes'). This gave me what I needed but the code is very long and doesn't save the values in a matrix.

I've never tried for-loops before so I'd appreciate it if anyone knew either how to do that or another efficient way of doing it.

Thank you!

2

2 Answers

1
votes

I don't really understand what you are trying to do, but here is how you would estimate a correlation matrix with p-values for each possible combination of the three first variables

by(df[,c("trial1","trial2")],list(df$cond,df$group,df$rand),function(x){
  return(list(cor(x),cor.test(x[,1],x[,2])$p.value))
})
0
votes
library(dplyr)
library(tidyr)
library(purrr)

d <- data.frame(
  cond = c("low", "medium", "high"),
  group = c("gr1", "gr2", "gr3"),
  rand = c("yes", "no"),
  trial1 = rnorm(30),
  trial2 = rnorm(30)
)

x <- d %>% 
  group_by(cond, rand, group) %>% 
  nest() %>% 
  mutate(
    cor_test = map(data, function(i) cor.test(i$trial1, i$trial2)),
    correlation = map_dbl(cor_test, ~ .x$estimate),
    p.value = map_dbl(cor_test, ~ .x$p.value)
  )

x
#> # A tibble: 6 x 7
#>   cond   rand  group data             cor_test correlation p.value
#>   <fct>  <fct> <fct> <list>           <list>         <dbl>   <dbl>
#> 1 low    yes   gr1   <tibble [5 x 2]> <htest>      -0.0329   0.958
#> 2 medium no    gr2   <tibble [5 x 2]> <htest>       0.489    0.403
#> 3 high   yes   gr3   <tibble [5 x 2]> <htest>      -0.413    0.490
#> 4 low    no    gr1   <tibble [5 x 2]> <htest>      -0.240    0.697
#> 5 medium yes   gr2   <tibble [5 x 2]> <htest>      -0.144    0.817
#> 6 high   no    gr3   <tibble [5 x 2]> <htest>       0.0361   0.954

Created on 2019-08-23 by the reprex package (v0.3.0)

  1. You first group the data by all combinations of your factor levels
  2. Then you "nest" the data, i.e. for each group from step 1, create a "subset" of your data frame and save it in a list-variable called data (default name)
  3. create a new list-variable, cor_test, which saves the result from cor.test() calls using variables trial1 and trial2 from each subset
  4. create new variables, correlation and p.value, that simply extract the r (estimate) and p (p.value) elements from each object saved in the list-variable cor_test.

This is a very flexible approach, you just need to define the names of the variables for which you calculate the correlation (trial1 and trial2).