1
votes

I want to calculate correlations between a dataframe and a list of dataframes. Here is my sample:

library(lubridate)
v1 = seq(ymd('2000-05-01'),ymd('2000-05-10'),by='day')
v2 = seq(2,20, length = 10)
v3 = seq(-2,7, length = 10)
v4 = seq(-6,3, length = 10)

df1 = data.frame(Date = v1, Tmax = v2, Tmean = v3, Tmin = v4)

v1 = seq(ymd('2000-05-01'),ymd('2000-05-10'),by='day')
v2 = seq(3,21, length = 10)
v3 = seq(-3,8, length = 10)
v4 = seq(-7,4, length = 10)

abc = data.frame(Date = v1, ABC_Tmax = v2, ABC_Tmean = v3, ABC_Tmin = v4)

v1 = seq(ymd('2000-05-01'),ymd('2000-05-10'),by='day')
v2 = seq(4,22, length = 10)
v3 = seq(-4,9, length = 10)
v4 = seq(-8,5, length = 10)

def = data.frame(Date = v1, DEF_Tmax = v2, DEF_Tmean = v3, DEF_Tmin = v4)

v1 = seq(ymd('2000-05-01'),ymd('2000-05-10'),by='day')
v2 = seq(2,20, length = 10)
v3 = seq(-2,8, length = 10)
v4 = seq(-6,3, length = 10)

ghi = data.frame(Date = v1, GHI_Tmax = v2, GHI_Tmean = v3, GHI_Tmin = v4)

df2 <-list(abc, def, ghi)

names(df2) = c("ABC", "DEF", "GHI")

I want to have all correlation coefficients between df1 and df2, but only columnswise.

For example:

  • df1$Tmax and all df2*Tmax columns
  • df1$Tmean and all df2*Tmean columns
  • df1$Tmin and all df2*Tmin columns

I know that I can access all Tmax columns like that:

lapply(df2, "[[", 2)

I know how to calculate the correlation between 2 single values:

cor.test(df1$Tmax, df2$ABC$ABC_Tmax, method = "spearman")

But how can I do it for all columns at once? I tried this, which is not working:

cor.test(df1$Tmax, lapply(df2, "[[", 2), method = "spearman")

Any ideas?

1
Thanks! Yes that is what I want! Thank you!! It would be even better just to get the correlation coefficents. Oh and I need to do spearman, how can I add this? - Mr.Spock

1 Answers

1
votes

You could use lapply in combination with mapply to apply cor.test and extract a specific value from the test. For example, to get p.value and estimate we can do

lapply(2:4, function(i)  mapply(function(x, y) {
       a <- cor.test(x, y, method = "spearman")
       c(setNames(a$p.value, "pvalue"), a$estimate)
}, lapply(df2, "[[", i), df1[i]))