0
votes

Major Edit: I decided to rewrite this question since my original was poorly put. I will leave the original question below to maintain a record. Basically, I need to do Fisher's Test on tables as big as 4 x 5 with around 200 observations. It turns out that this is often a major computational challenge as explained here (I think, I can't follow it completely). As I use both R and Stata I will frame the question for both with some made-up data.

Stata:

    tabi 1 13 3 27 46 \ 25 0 2 5 3 \ 22 2 0 3 0 \ 19 34 3 8 1 , exact(10)

You can increase exact() to 1000 max (but it will take maybe a day before returning an error).

R:

    Job <- matrix(c(1,13,3,27,46, 25,0,2,5,3, 22,2,0,3,0, 19,34,3,8,1), 4, 5,
         dimnames = list(income = c("< 15k", "15-25k", "25-40k", ">40k"),
         satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS", "exstatic")))
    fisher.test(Job)

For me, at least, it errors out on both programs. So the question is how to do this calculation on either Stata or R?

Original Question: I have Stata and R to play with. I have a dataset with various categorical variables, some of which have multiple categories. Therefore I'd like to do Fisher's exact test with more than 2 x 2 categories i.e. apply Fisher's to a 2 x 6 table or a 4 x 4 table.

Can this be done with either R or Stata ?

Edit: whilst this can be done in Stata - it will not work for my dataset as I have too many categories. Stata goes through endless iterations and even being left for a day or more does not produce a solution.

My question is really - can R do this, and can it do it quickly ?

2
How big is your table remains a key detail.Nick Cox
Yes you are correct - just checking that now. n = 191, category 1 has 4 divisons, category 2 has 5. This is one that absolutely would not work for me. After about a day and a half it returned an error saying too many valuesuser2498193
This gets more puzzling. I see no reason why that strains Stata. Can you post the data?Nick Cox
Ah...I can't really (confidentiality reasons). My understanding is that it is just a raw processing power thingy - each time you add a category you exponentially increase the number of possible distributions of the observations and even at 4 x 5 and n = 191 its heavy duty calcutationuser2498193
I got instantaneous response to tabi 2 2 2 2 2 \ 13 13 13 13 13 \ 4 4 4 4 4 \ 20 20 20 20 20 , exact. That's not a proof but it's a counterexample to the idea that 4 x 5 (with n about 200) tables are necessarily problematic. I wonder if your memory request makes things worse.Nick Cox

2 Answers

5
votes

Have you studied the documentation of R function fisher.test? Quoting from help("fisher.test"):

For 2 by 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan and Joe (1993).

This is an example given in the documentation:

Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
              dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
                              satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
fisher.test(Job)

# Fisher's Exact Test for Count Data
# 
# data:  Job
# p-value = 0.7827
# alternative hypothesis: two.sided
3
votes

As far as Stata is concerned, your original statement was totally incorrect. search fisher leads quickly to help tabulate twoway and

  • the help for the exact option explains that it may be applied to r x c as well as to 2 x 2 tables

  • the very first example in the same place of Fisher's exact test underlines that Stata is not limited to 2 x 2 tables.

It's a minimal expectation anywhere on this site that you try to read basic documentation. Please!