0
votes

I need to correlate a gene with 47,000 other genes to find the 10 best correlation curves. Generally, my data frames have the gene names in the first column and the patients data in the next columns with gene names in the first row. Do I need to transpose the data frame to do the correlation tests? If I transpose, it works, but I believe there is a simpler way to do it. Can somebody help me?

enter image description here

pancreas_final <- read_delim("path", delim = "\t")
pancreas_final_t <- t(pancreas_final[,-1])
pancreas_final_t <- as.data.frame(pancreas_final_t)
names(pancreas_final_t) <- pancreas_final$X1
class(pancreas_final_t)
View(pancreas_final_t)

vec_cor <- cor(pancreas_final_t$CAMP, pancreas_final_t)
df_cor <- data_frame(gene = attributes(vec_cor)$dimnames[[2]], cor = c(vec_cor))
str(df_cor)

library(tidyverse)

df_cor %>%
  arrange(cor)

df_cor %>%
  arrange(desc(cor)) %>% 
  head(n = 10)
1

1 Answers

0
votes

You need to transpose your data frame if you want to calculate the correlation between the genes (rows in your data frame), try this for correlation between genes

correlation_btw_genes = cor(pancreas_final_t)

if you don't transpose your dataframe cor() function will calculate correlation between your patients