apply p.adjust function after group_by in dplyr

Question

My data dat is like this

set.seed(123)        
dat<- data.frame(
                comp = rep(1:4,2),
                grp = rep(c('A','B'), each=4),
                pval = runif(8, min=0, max=0.1) )
dat$pval[sample(nrow(dat), 1)] <- NA

pval column contains a list of p values from multiple ttest within each large group.
Now I need to apply the base r function p.adjust to adjust the p values within each group (A,B,...) what I did was:

dat %>%
    group_by(grp) %>% 
    mutate(pval.adj = p.adjust (pval, method='BH'))

Below is the output of the above code:

comp grp  pval       pval.adj
1   A   0.02875775  0.08179538  
2   A   0.07883051  0.08830174  
3   A   0.04089769  0.08179538  
4   A   0.08830174  0.08830174  
1   B   NA  NA  
2   B   0.00455565  0.01366695  
3   B   0.05281055  0.07921582  
4   B   0.08924190  0.08924190

The result does not make sense. The last entry of each group, pval and pval.adj are equal. Some pval.adj are much closer to pval than others. I think something is wrong with applying the p.adjust function after group_by. It took me hours but could not figure out why... I appreciate if someone could help me with that.

below is the p.adjust function usage:

p.adjust(p, method = p.adjust.methods, n = length(p))
p.adjust.methods
# c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
#   "fdr", "none")

Can you set a seed and run again? If we can get the same pval values you have, it would be easier to determine what the results should be. — Phil
Agree with the set.seed. I get the same results whether I run p.adjust in dplyr with a group by or if I pull a subset for one group out and run it by itself. It would also be good if you can explain why you "think something is wrong". It surprises me that I get the same value for two of the adjusted values in Group A (as do you), but this doesn't seem to have anything to do with dplyr or group_by. p.adjust(c(0.0983715529320762, 0.0183719095773995, 0.0271179967094213, 0.048782999929972), method = "BH") gives the same result. — Gregor Thomas
@Gregor you are right. One of the pval and pval.adj are equal. I thought after adjustment, pval.adj should always be larger.... seems like my programming part is right — zesla

Chuck P Chuck P · Accepted Answer · 2018-01-11T22:12:36

@zesla your code is fine. Your confusion lies in the difference between family-wise error rate and false discovery rate which is what BH does. With BH you are much more likely to see those equal values.

If you look at the doco for p.adjust and run the sample code:

set.seed(123)
x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
p <- 2*pnorm(sort(-abs(x)))

round(p, 3)
round(p.adjust(p), 3)
round(p.adjust(p, "BH"), 3)

you'll see the same effect. You can also run a traditional family-wise error rate adjustment like Holm to see the effect on your own data...

dat %>%
  group_by(grp) %>% 
  mutate(pval.adj = p.adjust (pval, method='holm'))

See also this article on how BH is calculated

apply p.adjust function after group_by in dplyr

2 Answers