3
votes

I am trying to find the maximum correlation in each column of a data.frame object by using the cor function. Let's say this object looks like

A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)


M <- data.frame(A,B,C,D,E)
N <- cor(M)

And the correlation matrix looks like

>N

             A           B            C            D            E
A  1.000000000  0.02676645  0.000462529  0.026875495 -0.054506842
B  0.026766455  1.00000000 -0.150622473  0.037911600 -0.071794930
C  0.000462529 -0.15062247  1.000000000  0.015170017  0.026090225
D  0.026875495  0.03791160  0.015170017  1.000000000 -0.001968634
E -0.054506842 -0.07179493  0.026090225 -0.001968634  1.000000000

In the case of the first column (A) I'd like R to return to me the value "D" since it's the maximum non-negative, non-"1" value in column A, along with it's associated correlation.

Any ideas?

3

3 Answers

6
votes

Another option:

library(data.table)
setDT(melt(N))[Var1 != Var2, .SD[which.max(value)], keyby=Var1]

Result with @cory's data (using set.seed(9)):

   Var1 Var2      value
1:    A    D 0.28933634
2:    B    C 0.13483843
3:    C    B 0.13483843
4:    D    A 0.28933634
5:    E    C 0.02588474

To understand how it works, first try running melt(N), which puts the data in long format.

3
votes

The column numbers are

(n <- max.col(`diag<-`(N,0)))
# [1] 4 4 5 2 3

The names are

colnames(N)[n]
# [1] "D" "D" "E" "B" "C"

The values are

N[cbind(seq_len(nrow(N)),n)]
# [1] 0.02687549 0.03791160 0.02609023 0.03791160 0.02609023
1
votes

Use apply on rows to get the max of the row for values less than one. Then use which to get the column index and then use the colNames to get the actual letters...

set.seed(9)
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)

M <- data.frame(A,B,C,D,E)
N <- cor(M)

N
            A            B           C           D           E
A 1.000000000  0.005865532  0.03595202  0.28933634  0.00795076
B 0.005865532  1.000000000  0.13483843  0.04252079 -0.09567275
C 0.035952017  0.134838434  1.00000000 -0.01160411  0.02588474
D 0.289336335  0.042520787 -0.01160411  1.00000000 -0.12054680
E 0.007950760 -0.095672747  0.02588474 -0.12054680  1.00000000

colnames(N)[apply(N, 1, function (x) which(x==max(x[x<1])))]
[1] "D" "C" "B" "A" "C"