1
votes

I have created a correlation matrix in R using:

cor(df, use = "p")->df_corr

Then I melted the matrix using:

melt(df_corr) -> df_corr_melted

to get:

Var1 Var2 value
A    B    .1
A    C     .2
A    A     1
B    A     .1
B    C     .4
B    B     1
C    A     .2
C    B     .4
C    C     1

I'd like to remove the redudnant pairs. For example, I only need corr(A,C) not corr(C,A). I read through the filtering commands in dplyr, but since the row combination is actually unique, these aren't true duplicates. Any suggestions?

3
Subset (filter with dplyr) where Var1 <= Var2. Use < instead of <= if you want to also omit the trivial X,X correlations. - Gregor Thomas
@Gregor, great answer. - jaslibra

3 Answers

3
votes

Before melting you can do:

data.frame(Var1=t(combn(colnames(df_corr),2)),Var2=df_corr[lower.tri(df_corr)])
3
votes

You can do this in one go by using replace to set the diagonal and either the upper or lower triangle of the matrix to NA, and then just melt(..., na.rm = TRUE):

Demo:

library(reshape2)
melt(replace(df_corr, lower.tri(df_corr, TRUE), NA), na.rm = TRUE)
#   Var1 Var2      value
# 4   aa   bb  0.5776151
# 7   aa   cc -0.4059593
# 8   bb   cc -0.5673487

Sample data:

set.seed(123)
df_corr <- cor(data.frame(aa = rnorm(10), bb = rnorm(10), cc = rnorm(10)), use = "p")
1
votes

Here's a way using combn and the apply function:

c_names <- combn(names(dat), 2)

cors <- apply(c_names, 2, FUN = function(x) cor(dat[x[1]], dat[x[2]]))

cbind.data.frame(t(c_names), cors)

   1  2       cors
1 aa bb  0.5776151
2 aa cc -0.4059593
3 bb cc -0.5673487

Data

set.seed(123)
dat <- data.frame(aa = rnorm(10),
                  bb = rnorm(10),
                  cc = rnorm(10))