1
votes

I have two dataframes that I would like to do t.test on the matching columns. Both dataframes are subsets of a big dataframe so all colnames are the same and matched (ncol= ~20000) and nrow(df1)=25 and nrow(df2)=23.

Example:

treatment<-matrix(rnorm(50), ncol=10)
control<-matrix(rnorm(50), ncol=10)

treatment
            [,1]        [,2]       [,3]       [,4]       [,5]       [,6]
[1,]  0.23442246  1.02256703  1.0499998  0.2913643 -1.2083822  0.3778403
[2,] -0.68888047 -0.03961717 -0.9978793 -0.9792061 -0.1831634  0.6140542
[3,] -1.88273887 -0.49701513  0.1845197  0.4385338  1.2249121  0.5444027
[4,]  1.21359446  0.87333933  0.5615304  0.3803339  1.1294489 -0.8777454
[5,] -0.02908159 -1.50296138  0.4624656  0.1335046  1.1665818 -0.4475185
          [,7]      [,8]       [,9]      [,10]
[1,] 0.5987723 0.5910937  0.4334874 -1.4198250
[2,] 0.2027346 0.8078187 -1.0573069  1.0727554
[3,] 0.5490159 0.5109912  1.7247428  1.7745333
[4,] 0.3044544 0.6476548  1.1959365 -0.1220841
[5,] 1.8681375 0.8451147  0.4283893  0.1044125

control
          [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
[1,]  0.6712834 -0.3775649  0.7741285  0.51224345  0.24128336  1.02580198
[2,]  0.3894112 -0.1835289  0.4982122  1.73512459  0.08991013 -0.04406897
[3,]  1.7068503  0.7909355 -0.3341426  0.08780239 -1.11563321  2.09984105
[4,] -0.7634818 -1.3672888  0.2161816 -0.65170516  0.81247509  1.68008404
[5,]  0.5787616  0.1704100 -0.3166737  0.90167409 -2.34854292  0.31571255
           [,7]       [,8]       [,9]      [,10]
[1,] -1.6111883  0.1019497 -0.1975491 -0.3776000
[2,]  0.7533329  1.1540590  1.0050663  2.0137347
[3,]  1.2224161  1.4411853 -0.4801494 -0.3891034
[4,]  0.1905461  0.9767801 -0.1442578 -0.9946735
[5,] -1.9581454 -0.2874181 -1.0421440 -0.6177782

I did some searching on SO and came across mapply():

mapply(t.test,treatment,control)
Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
  not enough 'x' observations

But when I do t.test on single columns:

t.test(treatment[,1],control[,1])

  Welch Two Sample t-test
data:  treatment[, 1] and control[, 1]
t = -1.1541, df = 7.492, p-value = 0.284
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.2577187  0.7635152
sample estimates:
mean of x  mean of y
-0.2305368  0.5165649

What is wrong here?

1

1 Answers

2
votes

treatment and control, as matrix objects, are essentially a vector (like c(1,2,3)) and thus mapply tries to run a t.test comparing each individual number. E.g.:

treatment[1]
#[1] 0.7545039
control[1]
#[1] -0.3926361

t.test(treatment[1],control[1])
#Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) : 
#  not enough 'x' observations

If you convert your matrices to data.frame objects, each column will be treated as a single object and mapply will work just fine:

mapply(t.test,as.data.frame(treatment),as.data.frame(control))

#            V1                                     
#statistic   -0.7829546                             
#parameter   7.698139                               
#p.value     0.4570611                              
#etc etc 

In this case, I'm nearly sure using Map is more appropriate for readability's sake:

Map(t.test,as.data.frame(treatment),as.data.frame(control))

#$V1
#
#        Welch Two Sample t-test
#
#data:  dots[[1L]][[1L]] and dots[[2L]][[1L]]
#t = -0.783, df = 7.698, p-value = 0.4571
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.525349  0.756036
#sample estimates:
#  mean of x   mean of y 
#-0.31246928  0.07218723