1
votes

Why does Stata complain with a cryptic error when I use string variables in the table command?

Consider the following toy example:

sysuse auto, clear
decode foreign, g(foreign_str)
table foreign, contents(n foreign_str mean mpg)

This raises an r(111) variable __000002 not found error in Stata 13.1.

Tracing the error tells me that it is trying to run format __000002 %9.0gc and crashing when it does not find the variable. If I switch the order of the variables in the clist, that is i run table foreign, contents(n mpg_rank mean mpg), I get the same error but with __000003 instead of __000002.

So it appears that Stata crashes when it finds the string variable. If I replace the string variable with a numeric variable, the error doesn't occur.

I know it is not meaningful to compute summary statistics on string variables, but counting the number of observations of a string variable (in each group specified by the rowvar) makes perfect sense.

1
You can use table foreign, contents(freq mean mpg) to get around this.dimitriy
@DimitriyV.Masterov Right, that does exactly what I want; I'm mainly curious as to why Stata throws this (in my opinion) cryptic error. It seems like the function should check for string variables first, and throw a type mismatch or string variable not supported error, instead of a variable not found error (since the variable obviously exists).Michael A
Unfortunately, I don't really know the answer.dimitriy
@DimitriyV.Masterov see my answer about why this happens in Stata 13.user8682794

1 Answers

1
votes

Stata complains because variable __000002 (or __000003 if you change the order) is not created by the collapse command (which is used internally by table) due to the following error:

collapse (count) foreign_str
type mismatch
r(109);

What really happens is not visible to the user because capture is used in combination with collapse and the output from trace confirms that:

- capture collapse `clist' `wgt', by(`varlist' `by') fast `cw'
= capture collapse  (count) __000002=foreign_str (mean) __000003=mpg , by(foreign ) fast

There are only provisions for error codes 111 and 135, so the table command continues to run until it hits a wall when it cannot find the aforementioned variables.

Stata 14 and later versions check the variable(s) provided by the user in the contents() option and only accept numeric types, issuing a more informative error message if this is not the case.

It is also worth pointing out that collapse treats strings differently in more recent Stata versions.