2
votes

I have a dataframe that has three variables: a, b, c. The first two columns in the data set are the pairing of two of the variables for all possible combinations and the third is the correlation between them. Shown below.

> var1 <- c("a","a","b")
> var2 <- c("b","c","c")
> r <- c(.55,.25,.75)
> as.data.frame(cbind(var1,var2,r))
  var1 var2    r
1    a    b  0.55
2    a    c  0.25
3    b    c  0.75

My question is whether it is possible to turn this dataframe containing the correlations into a correlation matrix object in R? I also want to use some of R's plotting and graphing functions.

Ultimately what I want is a matrix that looks like this

  a    b    c
a 1   .55  .25

b .55  1   .25

c .25  .75  .75
2
xtabs(r ~ ., data=data.frame(var1,var2,r)) may get close depending on what you want exactly. If you could clarify I can adjust such an answer.thelatemail
Sorry for the confusion and thanks for your help. This is close, what I would ultimately like is a b c a 1 .55 .25 b .55 1 .75 c .25 .75 1Jean1213
Because I couldn't get it to work in the comments, I added a picture of the matrix I am kind of aiming for in the main question.Jean1213
@Jean1213 the matrix you provided contains some wrong values (typos?), e.g., cor(c,c) should be 1 and not 0.75.Sandipan Dey

2 Answers

2
votes

Try this:

vars <- unique(c(var1, var2))
df <- cbind.data.frame(var1,var2,r)
cor.df <- expand.grid(vars, vars)
cor.df <- rbind(merge(cor.df, df, by.x=c('Var1', 'Var2'), by.y=c('var1', 'var2')),
                merge(cor.df, df, by.x=c('Var2', 'Var1'), by.y=c('var1', 'var2')),
                data.frame(Var1=vars, Var2=vars, r=1))
library(reshape2)
cor.mat <- dcast(cor.df, Var1~Var2, value.var='r')
rownames(cor.mat) <- cor.mat[,1]
cor.mat <- as.matrix(cor.mat[-1])
cor.mat
#      a    b    c
# a 1.00 0.55 0.25
# b 0.55 1.00 0.75
# c 0.25 0.75 1.00

# plot the correlation matrix
library(ggplot2)
ggplot(data = cor.df, aes(x=Var1, y=Var2, fill=r)) + 
  geom_tile()

enter image description here

2
votes

Obviously you have the upper triangular and lower triangular parts of your correlation matrix (provided by r) and you do not need the data.frame for your purpose. Supplying the lower and upper triangle for a matrix which elements are 1 is enough.

var <- unique(c(var1,var2))
corr <- matrix(1,nrow=length(r),ncol=length(r)) # a matrix with 1s
corr[lower.tri(corr,diag = FALSE)] <- r # lower triangular matrix to be r
corr[upper.tri(corr,diag = FALSE)] <- r # upper triangular matrix to be r 
corr <- as.data.frame(corr) # formatting
row.names(corr) <- var # row names
colnames(corr) <- var # column names

Package corrplot has a function corrplot perfect for plotting correlation matrix with different options (see the argument method). Here is an example:

library(corrplot)
corrplot(as.matrix(corr),method="circle")

corrplot