I have a data set with 93 variables and I am trying to check for correlations between/among all variables and then screen for correlations above an absolute value of 0.5. I used How to compute correlations between all columns in R and detect highly correlated variables to help me with this problem and my code looks like this:
library(tibble)
library(dplyr)
library(tidyr)
co_mat = data %>%
as.matrix %>%
cor %>%
as.data.frame %>%
rownames_to_column(var = 'var1') %>%
gather(var2, value, -var1)
co_mat2 = filter(co_mat, abs(value) > .5)
This worked well except I noticed that I have a lot of instances where a variable is correlated with itself.
I also noticed that there are instances where the same variables were tested for correlation twice but are in different columns (i.e., redundant correlations).
I would like to return a correlation table [matrix] like that in co_mat2 [from my code]. But, I want to eliminate rows where a variable is tested for correlation with itself. I would also like to eliminate rows of redundant correlations.
Thank you in advance.