0
votes

I'm running a regression in Stata for which I would like to use cluster2 (http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/se_programming.htm).

I encounter the following problem. Stata reports factor variables and time-series operators not allowed. I am using a large vector of controls, extensively applying the methods Stata offers for interactions.

For example: state##c.wind_speed##L.c.relative_humidity. cluster2 and also other Stata packages do not allow to include such expressions as independent variables. Is there a productive way how to create such a long vector of interaction variables myself?

2

2 Answers

1
votes

I believe that one can trick ivreg2 by Baum-Shaffer-Stillman into running OLS with two-way clustering and interactions thusly:

. webuse nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. ivreg2 ln_w grade c.age##c.ttl_exp tenure, cluster(idcode year)

OLS estimation
--------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on idcode and year

Number of clusters (idcode) =     4697                Number of obs =    28099
Number of clusters (year) =         15                F(  5,    14) =   674.29
                                                      Prob > F      =   0.0000
Total (centered) SS     =  6414.823933                Centered R2   =   0.3206
Total (uncentered) SS   =  85448.21266                Uncentered R2 =   0.9490
Residual SS             =  4357.997339                Root MSE      =    .3938

---------------------------------------------------------------------------------
                |               Robust
        ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          grade |   .0734785    .002644    27.79   0.000     .0682964    .0786606
            age |  -.0005405    .002259    -0.24   0.811    -.0049681    .0038871
        ttl_exp |   .0656393   .0068499     9.58   0.000     .0522138    .0790648
                |
c.age#c.ttl_exp |  -.0010539   .0002217    -4.75   0.000    -.0014885   -.0006194
                |
         tenure |   .0197137   .0029555     6.67   0.000      .013921    .0255064
          _cons |   .5165052   .0529343     9.76   0.000     .4127559    .6202544
---------------------------------------------------------------------------------
Included instruments: grade age ttl_exp c.age#c.ttl_exp tenure
------------------------------------------------------------------------------

Just to be sure compare that to OLS coefficients:

. reg ln_w grade c.age##c.ttl_exp tenure

      Source |       SS           df       MS      Number of obs   =    28,099
-------------+----------------------------------   F(5, 28093)     =   2651.79
       Model |  2056.82659         5  411.365319   Prob > F        =    0.0000
    Residual |  4357.99734    28,093  .155127517   R-squared       =    0.3206
-------------+----------------------------------   Adj R-squared   =    0.3205
       Total |  6414.82393    28,098  .228301798   Root MSE        =    .39386

---------------------------------------------------------------------------------
        ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          grade |   .0734785   .0010414    70.55   0.000     .0714373    .0755198
            age |  -.0005405    .000663    -0.82   0.415    -.0018401    .0007591
        ttl_exp |   .0656393   .0030809    21.31   0.000     .0596007    .0716779
                |
c.age#c.ttl_exp |  -.0010539   .0000856   -12.32   0.000    -.0012216   -.0008862
                |
         tenure |   .0197137   .0008568    23.01   0.000     .0180344     .021393
          _cons |   .5165052   .0206744    24.98   0.000     .4759823     .557028
---------------------------------------------------------------------------------
0
votes

You don't include a verifiable example. See https://stackoverflow.com/help/mcve for key advice.

At first sight, however, the problem is that cluster2 is an oldish program written in 2006/2007 whose syntax statement just doesn't allow factor variables.

You could try hacking a clone of the program to fix that; I have no idea whether that would be sufficient.

No specific comment is possible on the "other Stata packages" you imply to have the same problem except that it may well arise for the same reason. Factor variables were introduced in Stata 11 (see here for documentation) in 2009 and older programs won't allow them without modification.

In general, I would ask questions like this on Statalist. It's quite likely that this program has been superseded by some different program.

If you find a Stata program on the internet without a help file, as appears to be the case here, it is usually an indicator that the program was written ad hoc and is not being maintained. In this case, it is evident also that the program has not been updated in the 6 years since Stata 11.

You could also, as you imply, just create the interaction variables yourself. I don't think anyone has written a really general tool to automate that: there would be no point (since 2009) in a complicated alternative to factor variable notation.