4
votes

I am looking for a way to calculate the multiple correlation coefficient in R http://en.wikipedia.org/wiki/Multiple_correlation, is there a built-in function to calculate it ? I have one dependent variable and three independent ones. I am not able to find it online, any idea ?

3
What do you mean by "program the formula"? Please read this about asking questions in a way that makes it easy for people to help you. More resources here. - Bryan Hanson
I mean is there a build in function to calculate such a thing or you have to calculate it yourself. - user1594047

3 Answers

4
votes

The built-in function lm gives at least one version, not sure if this is what you are looking for:

fit <- lm(yield ~ N + P + K, data = npk)
summary(fit)

Gives:

Call:
lm(formula = yield ~ N + P + K, data = npk)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.2667 -3.6542  0.7083  3.4792  9.3333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   54.650      2.205  24.784   <2e-16 ***
N1             5.617      2.205   2.547   0.0192 *  
P1            -1.183      2.205  -0.537   0.5974    
K1            -3.983      2.205  -1.806   0.0859 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.401 on 20 degrees of freedom
Multiple R-squared:  0.3342,    Adjusted R-squared:  0.2343 
F-statistic: 3.346 on 3 and 20 DF,  p-value: 0.0397

More info on what's going on at ?summary.lm and ?lm.

4
votes

The easiest way to calculate the multiple correlation coefficient (i.e. the correlation between two or more variables on the one hand, and one variable on the other) is to create a multiple linear regression (predicting the values of one variable treated as dependent from the values of two or more variables treated as independent) and then calculate the coefficient of correlation between the predicted and observed values of the dependent variable.

Here, for example, we create a linear model called mpg.model, with mpg as the dependent variable and wt and cyl as the independent variables, using the built-in mtcars dataset:

> mpg.model <- lm(mpg ~ wt + cyl, data = mtcars)

Having created the above model, we correlate the observed values of mpg (which are embedded in the object, within the model data frame) with the predicted values for the same variable (also embedded):

> cor(mpg.model$model$mpg, mpg.model$fitted.values)
[1] 0.9111681

R will in fact do this calculation for you, but without telling you so, when you ask it to create the summary of a model (as in Brian's answer): the summary of an lm object contains R-squared, which is the square of the coefficient of correlation. So an alternative way to get the same result is to extract R-squared from the summary.lm object and take the square root of it, thus:

> sqrt(summary(mpg.model)$r.squared)
[1] 0.9111681
-1
votes

Try this:

# load sample data 
data(mtcars)

# calculate correlation coefficient between all variables in `mtcars` using 
# the inbulit function

M <- cor(mtcars)

# M is a matrix of correlation coefficient which you can display just by  
# running 

print(M)

# If you want to plot the correlation coefficient 

library(corrplot)
corrplot(M, method="number",type= "lower",insig = "blank", number.cex = 0.6)