5
votes

I have a correlation matrix, that contains stock price correlations. it was calculated via:

corMatrix <- cor(cl2014, use="pairwise.complete.obs")

The matrix is much bigger but looks like this:

> corMatrix
             RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close    1.0000000   0.8990782    0.8700985  -0.2505789
CM.TO.Close    0.8990782   1.0000000    0.8240780  -0.4184085
BNS.TO.Close   0.8700985   0.8240780    1.0000000  -0.2141785
TD.TO.Close   -0.2505789  -0.4184085   -0.2141785   1.0000000

> class(corMatrix)
[1] "matrix"

I'm trying to determine how I can get the row and column names of all values in the matrix that have a correlation greater than some value.

I can index the matrix to generate an index matrix like so:

workingset <- corMatrix > 0.85

What I really want is just a list of row/col pairs identified by the row and column name so I know what pairs to do further exploration on.

How can I go from the indexing grid to the row/column names?

I'd ideally also only examine only the lower or upper portion of the matrix as to not generate duplicate values and of course the main diagonal can be ignored as it will always be 1.

3
Maybe which(corMatrix > 0.85, arr.ind = TRUE)?user3710546

3 Answers

10
votes

Another option is to use melt from "reshape2" and subset:

library(reshape2)
subset(melt(corMatrix), value > .85)
#            Var1         Var2     value
# 1   RY.TO.Close  RY.TO.Close 1.0000000
# 2   CM.TO.Close  RY.TO.Close 0.8990782
# 3  BNS.TO.Close  RY.TO.Close 0.8700985
# 5   RY.TO.Close  CM.TO.Close 0.8990782
# 6   CM.TO.Close  CM.TO.Close 1.0000000
# 9   RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16  TD.TO.Close  TD.TO.Close 1.0000000

You would need to do melt(as.matrix(corMatrix)) if your dataset is a data.frame since there are different melt methods for matrices and data.frames.


Update

As you mention you're only interested in the values from the upper triangle (to avoid duplicate pairs/values) and excluding the diagonal, you can do the following:

CM <- corMatrix                               # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA          # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85)   # melt and subset as before
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

You could also do this with base R. Continuing with "CM" from above, try:

subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985
4
votes

You can use which to get a matrix of row/col pairs. Use the arr.ind argument. Then we can match the row and column names for the pairs and put them into a data frame with their respective values.

w <- which(corMatrix > 0.85, arr.ind = TRUE)
data.frame(row = rownames(w), col = colnames(corMatrix)[w[, "col"]], 
           value = corMatrix[corMatrix > 0.85])
#            row          col     value
# 1  RY.TO.Close  RY.TO.Close 1.0000000
# 2  CM.TO.Close  RY.TO.Close 0.8990782
# 3 BNS.TO.Close  RY.TO.Close 0.8700985
# 4  RY.TO.Close  CM.TO.Close 0.8990782
# 5  CM.TO.Close  CM.TO.Close 1.0000000
# 6  RY.TO.Close BNS.TO.Close 0.8700985
# 7 BNS.TO.Close BNS.TO.Close 1.0000000
# 8  TD.TO.Close  TD.TO.Close 1.0000000
1
votes

Adding to accepted answer,

subset(melt(corMatrix)), value > .75 & value < 1.0)
subset(melt(corMatrix)), value < -.75 & value > -1.0)

would be more accurate I suppose. 'Correlation can be negative too'