Finding critical values for the Pearson correlation coefficient

5

votes

I'd like to use R to find the critical values for the Pearson correlation coefficient.

This has proved difficult to find in search engines since the standard variable for the Pearson correlation coefficient is itself r. In turn, I'm finding a lot of r critical value tables (rather than how to find this by using the statistical package R).

I'm looking for a function that will provide output like the following:

enter image description here

I'm comfortable finding the correlation with:

cor(x,y)

However, I'd also like to find the critical values.

Is there a function I can use to enter n (or degrees of freedom) as well as alpha in order to find the critical value?

r

8

votes

The significance of a correlation coefficient, r, is determined by converting r to a t-statistic and then finding the significance of that t-value at the degrees of freedom that correspond to the sample size, n. So, you can use R to find the critical t-value and then convert that value back to a correlation coefficient to find the critical correlation coefficient.

critical.r <- function( n, alpha = .05 ) {
  df <- n - 2
  critical.t <- qt(alpha/2, df, lower.tail = F)
  critical.r <- sqrt( (critical.t^2) / ( (critical.t^2) + df ) )
  return(critical.r)
}
# Example usage: Critical correlation coefficient at sample size of n = 100
critical.r( 100 )

3

votes

The general structure of hypothesis testing is kind of a mish-mash of two systems: Fisherian and Neyman-Pearson. Statisticians understand the differences but rarely does this get clearly presented in undergraduate stats classes. R was designed by and intended for statisticians as a toolbox, so they constructed a function named cor.test that will deliver a p-value (part of the Fisherian tradition) as well as a confidence interval for "r" (derived on the basis of the Neyman-Pearson formalism.) Fisher and Neyman had bitter disputes in their lifetime. The "critical value" terminology is part of the N-P testing strategy. It is equivalent to building a confidence interval and finding the particular statistic that reaches exactly a threshold value of 0.05 significance.

The code for constructing the inferential statistics in cor.test is available with:

 methods(cor.test)
 getAnywhere(cor.test.default)
 #  scroll down 
 method <- "Pearson's product-moment correlation"
 #-----partial code----
    r <- cor(x, y)
    df <- n - 2L
    ESTIMATE <- c(cor = r)
    PARAMETER <- c(df = df)
    STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2))
    p <- pt(STATISTIC, df)
   # ---- omitted some set up and error checking ----
   # this is the confidence interval  section------
        z <- atanh(r)
        sigma <- 1/sqrt(n - 3)
        cint <- switch(alternative, less = c(-Inf, z + sigma * 
            qnorm(conf.level)), greater = c(z - sigma * qnorm(conf.level), 
            Inf), two.sided = z + c(-1, 1) * sigma * qnorm((1 + 
            conf.level)/2))
        cint <- tanh(cint)

So now you know how R does it. Notice that there is no "critical value" mentioned. I suspect that your hope was to find some table where a tabulation of "r" and "df" was laid out displaying the minimum "r" that would reach a significance of 0.05 for a given 'df'. Such a table could be built but that's not how this particular toolbox is constructed. You should now have the tools to build it yourself.

2

votes

I would do the same. But if you are using a Spearman correlation you need to convert t into r using a different formula.

just change the last line before the return in the function with this one:

critical.r <- sqrt(((critical.t^2) / (df)) + 1)

Finding critical values for the Pearson correlation coefficient

3 Answers