4
votes

I'm trying to calculate some odds ratios and significance forsomething that can be out into a 2x2 table. The problem is the Fisher test in Sas is taking a long time.

I already have the cell counts. I could calculate a chi square if not for the fact that done of the sample sizes are extremely small. And yet some are extremely large, with cell sizes in the hundreds of thousands.

When I try to compute these in R, I have no problem. However, when I try to compute them in Sas, it either tasks way too long, out simply errors out with the message "Fishers exact test cannot be computed with sufficient precision for this sample size."

When I create a toy example (pull one instance from the dataset, and calculate it) it does calculate, but takes a long time. Data Bob; Input targ $ status $ wt; Cards; A c 4083 A d 111 B c 376494 B d 114231 ; Run;

Proc freq data = Bob; Weight wt; Tables targ*status; Exact Fisher; Run;

What is going wrong here?

1
You can explicitly request Monte Carlo estimation as suggested for large problems with the MC option to the exact statement.SRSwift

1 Answers

2
votes

That's funny. SAS calculates the Fisher's exact test p-value the exact way, by enumerating the hypergeometric probability of every single table in which the odds ratio is at least as big or bigger in favor of the alternative hypothesis. There's probably a way for me to calculate how many tables that is, but knowing that it's big enough to slow SAS down is enough.

R does not do this. R uses Monte Carlo methods which work just as fine in small sample sizes as large sample sizes.

tab <- matrix(c(4083, 111, 376494, 114231), 2, 2)
pc <- proc.time()
fisher.test(tab)
proc.time()-pc

gives us

> tab <- matrix(c(4083, 111, 376494, 114231), 2, 2)
> pc <- proc.time()
> fisher.test(tab)

        Fisher's Exact Test for Count Data

data:  tab
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  9.240311 13.606906
sample estimates:
odds ratio 
  11.16046 

> proc.time()-pc
   user  system elapsed 
   0.08    0.00    0.08 
> 

A fraction of a second.

That said, the smart statistician would realize, in tables such as yours, that the normal approximation to the log odds ratio is fairly good, and as such the Pearson Chi-square test should give approximately very similar results.

People claim two very different advantages to the Fisher's exact test: some say it's good in small sample sizes. Others say it's good when cell counts are very small in specific margins of the table. The way that I've come to understand it is that Fisher's exact test is a nice alternative to the Chi Square test when bootstrapped datasets are somewhat likely to generate tables with infinite odds ratios. Visually you can imagine that the normal approximation to the log odds ratio is breaking down.