1
votes

A Fisher exact test is often used for over representation analysis of gene lists in a pathway. Consider the following example of a contingency table:

              in pathway
                 Y   N
in gene list  Y 90  110  |  200 
              N 10  790  |  800
              ------------------
               100  900  | 1000

There are essentially two ways to do a Fisher test based over representation analysis in R. The first is to use fisher.test (which takes the contingency matrix as input)

fisher.test(matrix(c(90,10,110,790), nrow = 2), alternative = 'greater')$p.value
[1] 1.486473e-59

The second is to use phyper (Meng's notes give an excellent explanation on how to use phyper, including why the "-1", and what q, m, n, k exactly mean):

phyper(q=90-1, m=100, n=900, k=200, lower.tail = FALSE)
[1] 1.486473e-59

My question: why does this differ from:

1 - phyper(q=90-1, m=100, n=900, k=200, lower.tail = TRUE)
[1] 0
1

1 Answers

1
votes

The C-level code for phyper avoids some calculations (which cause floating point numerical errors), and hence is more accurate, when you specifically ask for the tail that you are interested in.