Finding row/column names from a correlation matrix values

Question

I have a correlation matrix, that contains stock price correlations. it was calculated via:

corMatrix <- cor(cl2014, use="pairwise.complete.obs")

The matrix is much bigger but looks like this:

> corMatrix
             RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close    1.0000000   0.8990782    0.8700985  -0.2505789
CM.TO.Close    0.8990782   1.0000000    0.8240780  -0.4184085
BNS.TO.Close   0.8700985   0.8240780    1.0000000  -0.2141785
TD.TO.Close   -0.2505789  -0.4184085   -0.2141785   1.0000000

> class(corMatrix)
[1] "matrix"

I'm trying to determine how I can get the row and column names of all values in the matrix that have a correlation greater than some value.

I can index the matrix to generate an index matrix like so:

workingset <- corMatrix > 0.85

What I really want is just a list of row/col pairs identified by the row and column name so I know what pairs to do further exploration on.

How can I go from the indexing grid to the row/column names?

I'd ideally also only examine only the lower or upper portion of the matrix as to not generate duplicate values and of course the main diagonal can be ignored as it will always be 1.

A5C1D2H2I1M1N2O1R2T1 A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2014-10-31T02:42:53

Another option is to use melt from "reshape2" and subset:

library(reshape2)
subset(melt(corMatrix), value > .85)
#            Var1         Var2     value
# 1   RY.TO.Close  RY.TO.Close 1.0000000
# 2   CM.TO.Close  RY.TO.Close 0.8990782
# 3  BNS.TO.Close  RY.TO.Close 0.8700985
# 5   RY.TO.Close  CM.TO.Close 0.8990782
# 6   CM.TO.Close  CM.TO.Close 1.0000000
# 9   RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16  TD.TO.Close  TD.TO.Close 1.0000000

You would need to do melt(as.matrix(corMatrix)) if your dataset is a data.frame since there are different melt methods for matrices and data.frames.

Update

As you mention you're only interested in the values from the upper triangle (to avoid duplicate pairs/values) and excluding the diagonal, you can do the following:

CM <- corMatrix                               # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA          # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85)   # melt and subset as before
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

You could also do this with base R. Continuing with "CM" from above, try:

subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

Finding row/column names from a correlation matrix values

3 Answers

Update