1
votes

I have a dataset in Stata of the following form

id | year 
a  | 1950       
b  | 1950  
c  | 1950
d  | 1950
.  
.
.
y  | 1950
-----
a  | 1951
b  | 1951
c  | 1951
d  | 1951
.
.
.
y  | 1951
-----
...

I'm looking for a quick way to rewrite the following code

gen dummya=1 if id=="a"
gen dummyb=1 if id=="b"
gen dummyc=1 if id=="c"
...
gen dummyy=1 if id=="y"

and

gen dummy50=1 if year==1950
gen dummy51=1 if year==1951
...
3

3 Answers

3
votes

Note that all your dummies would be created as 1 or missing. It is almost always more useful to create them directly as 1 or 0. Indeed, that is the usual definition of dummies.

In general, it's a loop over the possibilities using forvalues or foreach, but the shortcut is too easy not to be preferred in this case. Consider this reproducible example:

. sysuse auto, clear
(1978 Automobile Data)

. tab rep78, gen(rep78)

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

. d rep78?

              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------
rep781          byte    %8.0g                 rep78== 1.0000
rep782          byte    %8.0g                 rep78== 2.0000
rep783          byte    %8.0g                 rep78== 3.0000
rep784          byte    %8.0g                 rep78== 4.0000
rep785          byte    %8.0g                 rep78== 5.0000

That's all the dummies (some prefer to say "indicators") in one fell swoop through an option of tabulate.

For completeness, consider an example doing it the loop way. We imagine that years 1950-2015 are represented:

forval y = 1950/2015 { 
    gen byte dummy`y' = year == `y' 
} 

Two digit identifiers dummy50 to dummy15 would be unambiguous in this example, so here they are as a bonus:

forval y = 1950/2015 { 
    local Y : di %02.0f mod(`y', 100) 
    gen byte dummy`y' = year == `y' 
} 

Here byte is dispensable unless memory is very short, but it's good practice any way.

If anyone was determined to write a loop to create indicators for the distinct values of a string variable, that can be done too. Here are two possibilities. Absent an easily reproducible example in the original post, let's create a sandbox. The first method is to encode first, then loop over distinct numeric values. The second method is find the distinct string values directly and then loop over them.

clear 
set obs 3 
gen mystring = word("frog toad newt", _n) 

* Method 1 

encode mystring, gen(mynumber) 
su mynumber, meanonly 

forval j = 1/`r(max)' { 
    gen dummy`j' = mynumber == `j' 
    label var dummy`j' "mystring == `: label (mynumber) `j''" 
} 

* Method 2 

levelsof mystring

local j = 1 
foreach level in `r(levels)' { 
    gen dummy2`j' = mystring == `"`level'"' 
    label var dummy2`j' `"mystring == `level'"'
    local ++j 
} 

describe 

Contains data
  obs:             3                          
 vars:             8                          
 size:            96                          
------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------
mystring        str4    %9s                   
mynumber        long    %8.0g      mynumber   
dummy1          float   %9.0g                 mystring == frog
dummy2          float   %9.0g                 mystring == newt
dummy3          float   %9.0g                 mystring == toad
dummy21         float   %9.0g                 mystring == frog
dummy22         float   %9.0g                 mystring == newt
dummy23         float   %9.0g                 mystring == toad
------------------------------------------------------------------------------
Sorted by: 
1
votes

Use i.<dummy_variable_name>

For example, in your case, you can use following command for regression:

reg y i.year
1
votes

I also recommend using

egen year_dum = group(year) 
reg y i.year_dum

This can be generalized arbitrarily, and you can easily create, e.g., year-by-state fixed effects this way:

egen year_state_dum = group(year state) 
reg y i.year_state_dum