Post-estimation commands in Stata _b[] for categorical variables

Question

EDIT: a working example is provided further down

ORIGINAL: A post estimation command can be used to predict the value of the dependent variable. Here is an example, where you can type _b[_cons] + _b[x1]*1 + _b[x2] to get an actual value of Y. For most examples online on Stata, those values are either dummies or continuous. What if I have a categorical variable that is hard to manually transform into multiple dummies (like 52 weeks)? Can I preserve all my categorical variables and still run a post estimation command like the one below by telling Stata to pick the right value?

regress write female read

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  2,   197) =   77.21
       Model |  7856.32118     2  3928.16059           Prob > F      =  0.0000
    Residual |  10022.5538   197  50.8759077           R-squared     =  0.4394
-------------+------------------------------           Adj R-squared =  0.4337
       Total |   17878.875   199   89.843593           Root MSE      =  7.1327

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
        read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
       _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
------------------------------------------------------------------------------
and then ask

gen _b[_cons] + _b[female]*1 + _b[read]*52   

display _b[_cons] + _b[female]*1 + _b[read]*52
55.141383

WORKING EXAMPLE: To illustrate my point, here is a small data sample that contains one categorical variable (pack), one continuous variable (price), and one dichotomous indicator (type). After running a regression, I want to run a post estimation command (like predict or a simple gen) that could generate predicted values. For that purpose, the only Stata codes I have found so far can only predict y using continuous and binary variables but not categorical. Are you aware of a code that can solve the problem of including pack without converting pack into multiple binary variables?

clear
input  units price pack type
32 4 6 1
2 20 18 1
34 5 6 1
32 8 6 0
29 5 6 0
5 10 12 0
7 10 12 0
1 10 18 0

end

reg units price type i.pack
predict yhat
*OR
gen yhat=_b[_cons]+_b[_type]+....??pack??

Note that your generate statement is not legal code. This is a broad question about basics: help estimates is one place to start. — Nick Cox
Hi Nick. I have difficulty finding a post-estimation command that would choose an appropriate categorical value. It only works for dummies and continuous variables. Any suggestion where I could look for it? — Olga
Sorry; I don't understand at all. I don't know how you can have a categorical predictor except through a set of indicators. Broader discursive questions like yours go down better on Statalist, but precise examples based on data and code are much more likely to get good answers in any forum. — Nick Cox
I don't understand why you feel that predict does not work in your example. — user4690969
You seem to be expecting that there is a single term for a categorical predictor with three or more classes. But there isn't. — Nick Cox

dimitriy dimitriy · Accepted Answer · 2017-02-10T22:07:59

It is is not immediately clear what you mean by the "right" value. predict uses whatever value is in currently in the data and multiplies it by the corresponding coefficient (assuming you used factor variable notation).

However, margins has a semi-documented generate() option that can give you individual predictions at flexibly chosen values of covariates. It will likely give you whatever your heart desires.

Here's an example using your data:

clear
input  units price pack type
32 4 6 1
2 20 18 1
34 5 6 1
32 8 6 0
29 5 6 0
5 10 12 0
7 10 12 0
1 10 18 0
end

reg units price type i.pack, coefl
predict double yhat1
margins, predict(xb) gen(yhat2) // match predict #1
margins, predict(xb) gen(yhat3) at((asobserved) price type pack) // match predict #2
gen double yhat4=_b[_cons] + _b[price]*price + _b[type]*type + _b[12.pack]*12.pack + _b[18.pack]*18.pack //match predict #3
margins, predict(xb) gen(yhat5) at(price = 5 type=1 pack=6) // choose some values
gen double yhat6=_b[_cons] + _b[price]*5 + _b[type]*1 + _b[12.pack]*0 + _b[18.pack]*0 // yhat5 by hand 
list yhat*, clean noobs

The predictions for the first four methods are all identical; the fifth and sixth ones will be different from the first four and all selfsame since we are fixing all the covariates at particular values:

. list yhat*, clean noobs

        yhat1      yhat21      yhat31       yhat4      yhat51       yhat6  
    32.773585   32.773585   32.773585   32.773585   32.764151   32.764151  
    2.4622642   2.4622642   2.4622642   2.4622642   32.764151   32.764151  
    32.764151   32.764151   32.764151   32.764151   32.764151   32.764151  
    30.716981   30.716981   30.716981   30.716981   32.764151   32.764151  
    30.745283   30.745283   30.745283   30.745283   32.764151   32.764151  
            6           6           6           6   32.764151   32.764151  
            6           6           6           6   32.764151   32.764151  
    .53773585   .53773585   .53773585   .53773585   32.764151   32.764151

See help margins generate and help undocumented to learn more.

Post-estimation commands in Stata _b[] for categorical variables

1 Answers