I have 14000 gene (column:Gene) and 200 samples (column: sample1 sample2 ...)
I am trying calculate correlations for ~14000 genes all against all and append all gene correlations and required columns from the dataset(test_df) in a new dataframe(df1) and write results to a text file.
When I run the the code, I am getting correlations between (Gene1 and Gene2) and (Gene1 and Gene3). When the loop comes to Gene2 It breaks and the error says
Error in cor.test.default(as.matrix(test_df[i, ][, 3:length(test_df)]), : not enough finite observations
I have 3 to 4 values per rows this shouldn't be the case.
Please suggest any efficient way of doing this since I have to do correlations for 14000 genes.How can I run this code on multiple cores to get results faster?
Please find the code and the resulted file below.
Thanks in advance
> test_df <- data.frame(ID=c("ID_3721", "ID_537", "ID_555"),
Gene=c("Gene1","Gene2","Gene3"),
sample1=c(11397,78191,44838),
sample2=c(33768,33763,7680),
sample3=c(74521,33268,72367),
sample4=c(51486,11435,28772),
sample5=c(73539,21486,0))
> test_df
## ID Gene sample1 sample2 sample3 sample4 sample5
##1 ID_3721 Gene1 11397 33768 74521 51486 73539
##2 ID_537 Gene2 78191 33763 33268 11435 21486
##3 ID_555 Gene3 44838 7680 72367 28772 0
for(i in 1:2){
for(j in i+1:3){
p.cor <- cor.test(as.matrix(test_df[i,][,3:length(test_df)]), as.matrix(test_df[j,][,3:length(test_df)]), method="pearson")$estimate
s.cor <- cor.test(as.matrix(test_df[i,][,3:length(test_df)]), as.matrix(test_df[j,][,3:length(test_df)]), method="spearman")$estimate
df1 <- data.frame(ID1 = test_df[i,1],
ID2 = test_df[j,1],
Name1 = test_df[i,2],
Name2 = test_df[j,2],
correlation.p = p.cor
correlation.s = s.cor)
write.table(df1, file="genecorr.txt", row.names=FALSE, sep="\t", append=TRUE, quote=FALSE, col.names = !file.exists("genecorr.txt"))
}
}
**Error in cor.test.default(as.matrix(test_df[i, ][, 3:length(test_df)]), :
not enough finite observations**
genecorr.txt
ID1 ID2 NAME1 NAME2 correlation.p correlation.s
ID_3721 ID_537 Gene1 Gene2 -0.136733508500744 -0.1
ID_3721 ID_555 Gene1 Gene3 0.145998550191942 0.3
cor.test
if you are just interested with the estimates.cor
might be enough and faster probably – DJJ