Stata: using egen group() to create unique identifiers

Question

I have a dataset where each row is a firm, year pair with a firmid that is a string.

If I do

duplicates drop firmid year, force

it doesn't delete anything since there are no duplicates (I originally created the dataset after running duplicates drop firmid year, force).

So far so good. I want to create a panel which requires a firmid that is numeric. So I run

egen newid = group(firmid)
xtset newid year

But the 'repeated time values in panel' error pops up. Moreover,

duplicates list newid year

lists a whole bunch of duplicates.

It seems as though egen, group() isn't generating unique groups. My question is: why, and how do I create unique groups in a robust way?

Can you please post a reproducible example? For example, the complete offending code with a minimal data input that recreates the problem. See help input for creating short example data within a do-file. — Roberto Ferrer
Can you show the firmid for the duplicates? It would be handy to see all three variables when there are duplicates. — Richard Herron

jphaller jphaller · Accepted Answer · 2014-11-12T15:24:01

This is an old thread, but I have recently experienced the same symptoms, so I wanted to share my solution. Of course, so long as the questioner does not give further details, we will not know whether the causes are the same for me and him.

The problem turned out to be an issue of precision. As explained here in section 4.4, calculations done on integers stored as floats are precise only in the range up to 16,777,216. So, if you have more than 16,777,216 firms in your sample, rounding error will result in the same ID being assigned to multiple firms. This is straightforwardly dealt with by increasing the precision of the ID variable to long:

egen long newid = group(firmid)

Stata: using egen group() to create unique identifiers

1 Answers