1
votes

Am new to R language and packages.To pairwise pearson correlation analysis of about 9000 genes in a matrix format, I used psych package in R following the info from the link here

However, I face some problem in analysis which could not be solved using the psych manual.

First one: a general error "Error in cor(x, use = use, method = method) : 'x' must be numeric" . When I remove the element names and kept only the values, it works. How can I include the header as well?The following code showed the above error

library("psych")
myData <- read.clipboard.tab(header = TRUE) 
corr.test(myData)

My second doubt: What is the best method to filter pairs having pearson correlation >=0.5? I mean I should do it separately or there any method in R itself?

edit:

name    experiment1 experiment2 experiment3
gene1   -0.05814212 -0.3844461  1.4553193
gene2   -0.22045895 0.43413392  1.774345
gene3   1.4845127   -2.4423246  0.37565866
gene4   2.4195287   2.6537158   2.6640055
1
Could you provide a small reproducible example that gives the error. If you have a non-numeric first column, you could get the error. I assume that you removed the first column, and then the error is gone. One option would be to subset the dataset by removing the first column and change the rownames to the first column. Ie. myData1 <- myData[-1]; rownames(myData1) <- myData[,1]akrun
Actually, I need to retain the gene names in the output correlation table produced by the command corr.test(myData). Yes you are right, it works when I emove the gene names @akrunLee
By doing corr.test(myData), you are comparing the columns with each other. I don't know how you want to retain the gene names. Can you show the expected format. Do you need corr.test(t(myData1)) ? Here, myData1 is based on my previous commentakrun
Sorry for the confusion. In the edit, I have shown how my data looks like. It is microarray data across many samples. Here I wanted to see the correlation among each pair of genes. For instance, between gene1 and 2; gene 1 and 3; gene 2 and gene 4 etc @akrunLee
I posted the comments as a solution. Could you check?akrun

1 Answers

1
votes

You could try

library(psych)
myData1 <- myData[-1]
rownames(myData1) <- myData[,1]
Corrt <- corr.test(t(myData1))
Corrt$r[Corrt$r >= 0.5]

If you need to preserve the structure, then we change the value < 0.5 to NA

 is.na(Corrt$r) <- Corrt$r < 0.5
 Corrt$r
 #          gene1     gene2 gene3     gene4
 #gene1 1.0000000 0.8801186    NA        NA
 #gene2 0.8801186 1.0000000    NA 0.7761407
 #gene3        NA        NA     1        NA
 #gene4        NA 0.7761407    NA 1.0000000

data

myData <- structure(list(name = c("gene1", "gene2", "gene3", "gene4"), 
experiment1 = c(-0.05814212, -0.22045895, 1.4845127, 2.4195287
), experiment2 = c(-0.3844461, 0.43413392, -2.4423246, 2.6537158
), experiment3 = c(1.4553193, 1.774345, 0.37565866, 2.6640055
)), .Names = c("name", "experiment1", "experiment2", "experiment3"
), class = "data.frame", row.names = c(NA, -4L))