1
votes

In this example, I have temperatures values from 50 different sites, and I would like to correlate the Site1 with all the 50 sites. But I want to extract only the components "p.value" and "estimate" generated with the function cor.test() in a data.frame into two different columns.

I have done my attempt and it works, but I don't know how! For that reason I would like to know how can I simplify my code, because the problem is that I have to run two times a Loop "for" to get my results.

Here is my example:

# Temperature data
 data <- matrix(rnorm(500, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
               dimnames = list(c(paste("Year", 1:100)),
                               c(paste("Site", 1:50))) )
# Empty data.frame
 df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")

# Extraction
for (i in 1:50) {
 df1 <- cor.test(data[,1], data[,i] )
 df[,2:3] <- df1[c("estimate", "p.value")]
   }

for (i in 1:50) {
  df1 <- cor.test(data[,1], data[,i] )
 df[i,2:3] <- df1[c("estimate", "p.value")]
   }

 df

I will appreciate very much your help :)

3

3 Answers

6
votes

I might offer up the following as well (masking the loops):

result <- do.call(rbind,lapply(2:50, function(x) {
  cor.result<-cor.test(data[,1],data[,x])
  pvalue <- cor.result$p.value
  estimate <- cor.result$estimate
  return(data.frame(pvalue = pvalue, estimate = estimate))
})
)
1
votes

First of all, I'm guessing you had a typo in your code (you should have rnorm(5000 if you want unique values. Otherwise you're going to cycle through those 500 numbers 10 times.

Anyway, a simple way of doing this would be:

data <- matrix(rnorm(5000, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
               dimnames = list(c(paste("Year", 1:100)),
                               c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
estimates = numeric(50)
pvalues = numeric(50)
for (i in 1:50){
  test <- cor.test(data[,1], data[,i])
  estimates[i] = test$estimate
  pvalues[i] = test$p.value
}
df$Estimate <- estimates
df$P.value <- pvalues
df

Edit: I believe your issue was is that in the line df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="") if you do typeof(df$Estimate), you see it's expecting an integer, and typeof(test$estimate) shows it spits out a double, so R doesn't know what you're trying to do with those two values. you can redo your code like thus:

df <- data.frame(label=paste("Site", 1:50), Estimate=numeric(50), P.value=numeric(50))
for (i in 1:50){
  test <- cor.test(data[,1], data[,i])
  df$Estimate[i] = test$estimate
  df$P.value[i] = test$p.value
}

to make it a little more concise.

1
votes

similar to the answer of colemand77:

create a cor function:

cor_fun <- function(x, y, method){
  tmp <- cor.test(x, y, method= method)
  cbind(r=tmp$estimate, p=tmp$p.value) }

apply through the data.frame. You can transpose the result to get p and r by row:

t(apply(data, 2, cor_fun, data[, 1], "spearman"))