1
votes

I have a correlation matrix and I need to extract the top values and remove the reverse duplicates (sw6 & sw4 = 0.6 correlated means the same as sw4 & sw6 = 0.6).

I used an answer from here that uses reshape to output the top correlations above 0.5. Now my only problem is that there are reverse duplicates. This question here does pretty much what I want which is to remove these reverse duplicates, but only with apply which I'm afraid may slow down my code for large sets. Is there a way to remove reverse dupes without it?

Or is there a better way to get the top correlations of a matrix while only producing unique combinations?

Output after the reshape melt looks like this:

X1 X2 value sw6 sw4 0.6299408 sw4 sw6 0.6299408 ss sl 0.5833333 sl ss 0.5833333 id ty 0.5724780 ty id 0.5724780 sl br 0.5333965 br sl 0.5333965

But every two rows are the same.

1

1 Answers

1
votes

One option is to replace either the upper.tri or lower.tri to NA and then melt. This had the advantage of pre-processing without having to post-process. For large datasets, it would be better to do pre-processing rather than convert to long dataset and then remove the duplicates

library(reshape2)
m1[lower.tri(m1, diag = TRUE)] <- NA
melt(m1, na.rm = TRUE)

NOTE: Also, no need for any additional packages except the one the OP is already using