1
votes

I have a data frame with one dependent variable and 20 independent variables. I would like to find the correlation coefficients between the dependent variable and each of the independent variables and the associated p-values. I wrote the following function:

for (i in 2:20){
     correl = cor.test(df[ , i], df[ , 22])
     print(correl)
}

It prints out a one correlation coefficient and its p-value at a time. Is there a function that will produce the same results in a tabular format?

1
Just as a reference, see my blog post for getting correlations after seeing the same sort of question on Stack Overflow many times: drsimonj.svbtle.com/… - Simon Jackson

1 Answers

1
votes

You can use sapply to grab a vector of results, one for each pair:

base.idx <- 1
other.idx <- 2:20
cors <- unname(sapply(other.idx, function(i) cor.test(df[,base.idx], df[,i])$estimate))
pvals <- unname(sapply(other.idx, function(i) cor.test(df[,base.idx], df[,i])$p.value))

Here's an example with the built-in iris dataset, grabbing the correlation information between Sepal.Length and the other three numeric values:

base.idx <- 1
other.idx <- 2:4
(cors <- unname(sapply(other.idx, function(i) cor.test(iris[,base.idx], iris[,i])$estimate)))
# [1] -0.1175698  0.8717538  0.8179411
(pvals <- unname(sapply(other.idx, function(i) cor.test(iris[,base.idx], iris[,i])$p.value)))
# [1] 0.1518983 0.0000000 0.0000000