2
votes

I am using Stata/SE 12 with maxvars set to 32767 and memory set to 640 Mb. My current dataset consists of 9,000 observations (in rows) and 16,800 variables (in columns v1, v2 and up to v16800).

I would like to convert the dataset into long format using the reshape command using the following line of code:

reshape long v , i(simulation) j(_count)

Stata gives me error 134: _count takes too many values.

Are there any Stata limitations on number of observations? What could be the problem here?

2

2 Answers

5
votes

The limit has to do with the way Stata creates the variable _count, which involves tabulate. This means that it can handle at most 12,000 variables. What you can do is split the file in two, reshape each sub-file, and append them afterwards. Something like so:

// create some example data
clear
set obs 5
gen id = _n
forvalues i = 1/10 {
    gen v`i' = rnormal()
}

// split the files:
tempfile orig one two

save `orig'

keep id v1-v5
save `one'

use `orig'
keep id v6-v10
save `two'

// reshape the files separately
use `one'
reshape long v, i(id) j(_count)
save `one', replace

use `two'
reshape long v, i(id) j(_count)
save `two', replace

// bring the files together again
append using `one'
sort id _count
list, sepby(id)
2
votes

This isn't the best fix, but try splitting the variables in the main dataset into two or more datasets and then doing the reshape separately and appending.

I simulated and replicated this using Stata/MP 11 with set memory 2g and set maxvar 32000 and I could reshape a dataset with 8,000 variables, but I get the same error when trying the same with 16.8k variables. Though even with 8k, it took awhile to complete the command. It may be even better to split into datasets of 1000 since the eventual append loop would only require one change.

Reshape appears to slow down considerably beyond a couple hundred values of j; someone else may know more technical background on this.