20
votes

Is it possible somehow to do a t.test over multiple variables against the same categorical variable without going through a reshaping of the dataset as follows?

data(mtcars)
library(dplyr)
library(tidyr)
j <- mtcars %>% gather(var, val, disp:qsec)
t <- j %>% group_by(var) %>% do(te = t.test(val ~ vs, data = .))

t %>% summarise(p = te$p.value)

I´ve tried using

mtcars %>% summarise_each_(funs = (t.test(. ~ vs))$p.value, vars = disp:qsec)

but it throws an error.

Bonus: How can t %>% summarise(p = te$p.value) also include the name of the grouping variable?

4
You should consider adding whitespace to your code.x4nd3r
This may be a partial solution (void of summarise portion) by data.table : (step1) library(data.table) (step2) setDT(j) (Step3) j[, te := t.test(value~vs), by=variable][]KFB

4 Answers

17
votes

After all discussions with @aosmith and @Misha, here is one approach. As @aosmith wrote in his/her comments, You want to do the following.

mtcars %>%
    summarise_each(funs(t.test(.[vs == 0], .[vs == 1])$p.value), vars = disp:qsec)

#         vars1        vars2      vars3        vars4        vars5
#1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06

vs is either 0 or 1 (group). If you want to run a t-test between the two groups in a variable (e.g., dips), it seems that you need to subset data as @aosmith suggested. I would like to say thank you for the contribution.

What I originally suggested works in another situation, in which you simply compare two columns. Here is sample data and codes.

foo <- data.frame(country = "Iceland",
                  year = 2014,
                  id = 1:30,
                  A = sample.int(1e5, 30, replace = TRUE),
                  B = sample.int(1e5, 30, replace = TRUE),
                  C = sample.int(1e5, 30, replace = TRUE),
                  stringsAsFactors = FALSE)

If you want to run t-tests for the A-C, and B-C combination, the following would be one way.

foo2 <- foo %>%
        summarise_each(funs(t.test(., C, pair = TRUE)$p.value), vars = A:B) 

names(foo2) <- colnames(foo[4:5])

#          A         B
#1 0.2937979 0.5316822
11
votes

I like the following solution using the powerful "broom" package:

library("dplyr")
library("broom")

your_db %>%
  group_by(grouping_variable1, grouping_variable2 ...) %>%
  do(tidy(t.test(variable_u_want_2_test ~ dicothomous_grouping_var, data = .)))
6
votes

Realizing that the question is fairly old, here is another answer for the reference of future generations.

This is more general than the accepted answer since it allows for dynamically generated variable names rather than hard-coded.

vars_to_test <- c("disp","hp","drat","wt","qsec")
iv <- "vs"

mtcars %>%
  summarise_each_(
    funs_( 
      sprintf("stats::t.test(.[%s == 0], .[%s == 1])$p.value",iv,iv)
    ), 
    vars = vars_to_test)

which produces this:

          disp           hp       drat           wt         qsec
1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06

The idea of this solution is to use SE versions of dplyr functions (summarise_each_ and funs_) instead of NSE versions (summarise_each and funs). For more information about Standard Evaluation (SE) and Non-Standard Evaluation (NSE), please check vignette("nse").

2
votes

So I ended up hacking up a new function : df=dataframe , by_var=right hand side of formula, ... all variables on left hand side of formula (dplyr/tidyr select).

e.g: mult_t.test(mtcars,vs,disp:qsec)

mult_t.test<-function(df,by_var,...){
  require(dplyr)
  require(tidyr)
  by_var<-deparse(substitute(by_var))
  j<-df%>%gather(var,val,...)
  t<-j%>%group_by(var)%>%do(v=tes(.,by_var))
  k<-data.frame(levels(t$var),matrix(unlist(t$v),ncol=3,byrow = T))
  names(k)<-c("var",names(t$v[[1]]))
  k
}


tes<-function(df,vart){
  x<-t.test(df$val~df[[vart]])
  p<-x$estimate
  p<-c(p,p.val=x$p.value)
  p
}