R: Run t-test on previous years by group using dplyr

Question

I have a dataframe containing different groups, years and their values, for example:

data <- data.frame(
  group = c(rep('A', 120), rep('B', 120)),
  year  = rep(c(rep('2013-2014', 40), rep('2014-2015', 40), rep('2015-2016', 40)), 2),
  value = rnorm(240)
)

For each year within each group I want to run a t-test to see whether the values are significantly different to the previous years (I have been using the function t.test(x, y, var.equal = TRUE) to do this on a one-off)

I would like to return the a dataframe along with the p-values, or preferably significant stars generated using gtools::stars.pval(). So to return something like the following

group year      significance
A     2013-2014 NA
A     2014-2015 **
A     2015-2016 ***
B     2013-2014 NA
B     2014-2015
B     2015-2016

Where in the above the p value for difference between 2014-2015 and 2013-2014 for 'A' is between 0.001 and 0.01, and the p-value for the difference between 2015-2015 and 2014-2015 for A is <0.001. There is no evidence of any significant difference in any years for B.

There is no guarantee that each of the groups have the same number of years.

What is the best and quickest way of doing this? I was hoping that I could do it using dplyr and group_by by group and year?

Maksim Gayduk Maksim Gayduk · Accepted Answer · 2015-09-09T12:14:14

Another option is to summarise the data frame, storing all the values in one cell as a list (yes, you can do that - data frames can have nested lists inside!)

Using dplyr:

df=tbl_df(data)
df=arrange(df,group,year) %>% group_by(group,year) %>% summarise(values=list(value))
df=mutate(df,prev_values=lag(values))
df=group_by(df,group,year)
df=filter(df,!any(is.na(unlist(prev_values))))
df=mutate(df,p_value=t.test(unlist(values),unlist(prev_values),var.equal=TRUE)$p.value) %>% print

  group      year    values prev_values   p_value
1     A 2014-2015 <dbl[40]>   <dbl[40]> 0.7894477
2     A 2015-2016 <dbl[40]>   <dbl[40]> 0.2385581
3     B 2014-2015 <dbl[40]>   <dbl[40]> 0.3084138
4     B 2015-2016 <dbl[40]>   <dbl[40]> 0.2557849

R: Run t-test on previous years by group using dplyr

2 Answers