Is there a way to get past the "too many values" error in Stata when using tabulate?

Question

I am trying to generate frequencies for a variable in Stata conditional on categories of another variable.

This other categorical variable has about 790,000 observations for the category I am interested in.

Stata's 12,000 rows and 1,200 rows limit for one-way and two-way tables respectively makes this impossible.

Every time I run tab x if y==<category of interest> I get the following error:

too many values
r(134);

I installed the bigtab package and though it gives me tables it cannot be used with by or run statistical tests.

Is there a work around for this?

It seems silly that Stata should have this arbitrary limit when SAS and even SPSS can run the exact same operation without trouble.

Nick Cox Nick Cox · Accepted Answer · 2014-03-03T16:40:22

To some it might seem silly, or at least puzzling, that people want tables with more than 12000 rows, as there must be a better way to display results or answer the question that is in mind.

That said, the limits of tabulate are hard-wired. But you just need to think of reproducing whatever you want to show. So, for one-way frequencies

. bysort rowvar : gen freq = _N
. by rowvar : gen tag = _n == 1 
. gsort -freq rowvar 
. list rowvar freq if tag, noobs

and for two-way frequencies

. bysort rowvar colvar : gen freq = _N
. by rowvar colvar : gen tag = _n == 1 
. gsort -freq rowvar colvar
. list rowvar freq if tag, noobs

A similar approach, with more bells and whistles, is coded within groups (SSC). An even simpler approach in many ways is to collapse or contract the dataset and then list it.

To flag the general strategy here:

Produce what you want as new variables.
Select just one observation from each group if there are multiple observations.
list, not tabulate.

UPDATE

OP asked

. bysort rowvar : gen freq = _N

OP: This generates the freq variable for the last count of every individual value in my rowvar

Me: No. The freq variable is the count of observations for every distinct value of rowvar.

. by rowvar : gen tag = _n == 1

OP: This generates the tag variable for the first count of every unique observation in rowvar.

Me: Correct, provided you say "distinct", not "unique". Unique values occur once only.

. gsort -freq rowvar

OP: This sorts freq and rowvar in descending order

Me: It sorts freq in descending order and rowvar in ascending order within blocks of constant freq.

 . list rowvar freq if tag, noobs

OP: What does if do here?

Me: That one is left as an exercise.

Is there a way to get past the "too many values" error in Stata when using tabulate?

2 Answers