0
votes

My question concerns the proper use of # versus ## in Stata for interacting categorical and dependent variables. Here is the example I have in mind.

To understand the marginal effect of x on y I ran an experiment with three treatments (A, B, C) on two types of subjects (M, F). To understand the pooled marginal effect (and supposing I satisfy all OLS criteria) I can run reg y x. However I also want to understand the marginal effect for each "species" in each "environment", or the interactions of x with treatment and types.

Firstly, assuming x is continuous, is the proper syntax for estimating the pooled marginal effect and the treatment-type marginal effect

reg y x i.treatment#i.type#c.x

or

reg y i.treatment#i.type##c.x

or neither?

Secondly, is the proper syntax for estimating just the treatment-type marginal effect

reg y i.treatment#i.type#c.x, noconstant

wherein the constant is dropped? If the constant is kept, does it represent the pooled response?

Sorry if this is a rudimentary question, but after a few days reading I still struggle to grasp exactly what is the difference between # and ##. Many thanks in advance.

Note: the data proposed are clearly a panel so the xtreg command is more appropriate. To keep things simple I just pretended the data were simpler.

Edit: Here is an example with a built-in Stata dataset.

    . reg price c.mpg##i.foreign

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  3,    70) =    9.48
       Model |   183435281     3  61145093.6           Prob > F      =  0.0000
    Residual |   451630115    70  6451858.79           R-squared     =  0.2888
-------------+------------------------------           Adj R-squared =  0.2584
       Total |   635065396    73  8699525.97           Root MSE      =  2540.1

-------------------------------------------------------------------------------
        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
          mpg |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
    1.foreign |  -13.58741   2634.664    -0.01   0.996    -5268.258    5241.084
              |
foreign#c.mpg |
           1  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
              |
        _cons |   12600.54   1527.888     8.25   0.000     9553.261    15647.81
-------------------------------------------------------------------------------

mpg and 1.foreign capture, respectively, the marginal effect of a car's miles per gallon and whether it is foreign or domestic on price. foreign#c.mpg captures the interaction between the category dummy and the continuous x when the dummy is one (ie the car is foreign)? What then captures the interaction of domestic (dummy is zero) with mpg?

1

1 Answers

2
votes

1.

In the following you are including the main effect of x and a three-way interaction.

reg y x i.treatment#i.type#c.x

You are leaving out main effects, specifically, that of treatment and type.

The following

reg y i.treatment#i.type##c.x

expands to

reg y i.treatment#i.type c.x i.treatment#i.type#c.x

which includes the main effect of x, a two-way interaction and a three-way interaction.

Look around for information on the inclusion of interactions with(out) main effects. For example https://stats.stackexchange.com/questions/11009/including-the-interaction-but-not-the-main-effects-in-a-model.

2.

You say

I still struggle to grasp exactly what is the difference between # and ##.

This can be clarified reading help fvvarlist and the manual. At this stage, what the syntax implies doesn't seem to be the problem, but rather how to specify the model, which will depend on what theory suggests, what has been done before, etc.

3.

From your example

reg price c.mpg##i.foreign

which expands to

reg price c.mpg i.foreign c.mpg#i.foreign

(the two main effects of mpg and foreign, and the interaction between them), you ask

foreign#c.mpg captures the interaction between the category dummy and the continuous x when the dummy is one (ie the car is foreign)? What then captures the interaction of domestic (dummy is zero) with mpg?

Writing out the model helps (again two main effects and the intercation):

price = 12600.54 - 329.2551 mpg - 13.58741 foreign + 78.88826 mpg foreign

This clearly states how the effect of mpg over price is affected by the value of foreign; and how the effect of foreign over price is affected by the value of mpg. Because foreign takes only the values 0 and 1, the effect of mpg over price is easier to understand. Just substitute out foreign in

- 329.2551 mpg + 78.88826 mpg foreign

When foreign == 1, the effect of mpg is -250.36684. When foreign == 0, the effect is -329.2551.

The effect of foreign over price is computed likewise:

- 13.58741 foreign + 78.88826 mpg foreign

but now substituting out mpg. Because it is continuous, you should probably plug-in several values of mpg to better understand its effect over price (see help margins.)

(That's all assuming the corresponding coefficients are statiscally significant.)

I sense your question involves trouble in understanding both Stata syntax and statistical issues. The first can be clarified reading Stata help resources. Regarding the second, your question is phrased in such a way that people at Cross-Validated discarded it as a programming problem.