python - Fixed effect in Pandas or Statsmodels

Question

Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels.

There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm, but I can't import it or run it using pd.plm().

Please keep it to one question per question. Also, please explain what you mean by "i can't". Please include full tracebacks (if they exist) and a sample that is small and runnable on its own and that reproduces the problem. — Veedrac
Also don't avoid telling us relevant information. "there used to be a function" implies you know what that function is, so why you avoid telling us confuses me. — Veedrac
Since fixed effects is fully equivalent to OLS with properly demeaned target variables, why don't you just do the demeaning first and then run OLS, like this set of examples? I hope this is for some assignment or something though, because as a Bayesian it makes sad since every time someone uses fixed effects an angel loses its wings. — ely
@user3576212 That is unfortunate. It is very common in certain segments of social science, especially psychology and economics, that students are told to use techniques like fixed effects, but they never learn the real theory behind it. These methods are deeply flawed when used in real world settings and should never be used blindly as part of a software package, at least not until you have mastered the real theory behind it. You may find more help asking over at Cross-Validated. — ely
You're free to use whatever tools you want. I'm just saying that working in finance doing quant research has made me appreciate the criticisms of these methods more. They are not good for solving precisely the problems they are claimed to solve (such as cross-sectional correlation). It's similar with other very bad methods, like Fama-Macbeth regression. I'm not talking about anything academic, just applied econ research. — ely

Karl D. Karl D. · Accepted Answer · 2014-06-13T01:12:37

As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options:

If you use Python 3 you can use linearmodels as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183
Just specify various dummies in your statsmodels specification, e.g. using pd.get_dummies. May not be feasible if the number of fixed effects is large.

Or do some groupby based demeaning and then use statsmodels (this would work if you're estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects:

import statsmodels.api as sm
import statsmodels.formula.api as smf
import patsy

def areg(formula,data=None,absorb=None,cluster=None): 

    y,X = patsy.dmatrices(formula,data,return_type='dataframe')

    ybar = y.mean()
    y = y -  y.groupby(data[absorb]).transform('mean') + ybar

    Xbar = X.mean()
    X = X - X.groupby(data[absorb]).transform('mean') + Xbar

    reg = sm.OLS(y,X)
    # Account for df loss from FE transform
    reg.df_resid -= (data[absorb].nunique() - 1)

    return reg.fit(cov_type='cluster',cov_kwds={'groups':data[cluster].values})

For example, suppose you have a panel of stock data: stock returns and other stock data for all stocks, every month over a number of months and you want to regress returns on lagged returns with calendar month fixed effects (where the calender month variable is called caldt) and you also want to cluster the standard errors by calendar month. You can estimate such a fixed effect model with the following:

reg0 = areg('ret~retlag',data=df,absorb='caldt',cluster='caldt')

And here is what you can do if using an older version of Pandas:

An example with time fixed effects using pandas' PanelOLS (which is in the plm module). Notice, the import of PanelOLS:

>>> from pandas.stats.plm import PanelOLS
>>> df

                y    x
date       id
2012-01-01 1   0.1  0.2
           2   0.3  0.5
           3   0.4  0.8
           4   0.0  0.2
2012-02-01 1   0.2  0.7 
           2   0.4  0.5
           3   0.2  0.3
           4   0.1  0.1
2012-03-01 1   0.6  0.9
           2   0.7  0.5
           3   0.9  0.6
           4   0.4  0.5

Note, the dataframe must have a multindex set ; panelOLS determines the time and entity effects based on the index:

>>> reg  = PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
>>> reg

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x>

Number of Observations:         12
Number of Degrees of Freedom:   4

R-squared:         0.2729
Adj R-squared:     0.0002

Rmse:              0.1588

F-stat (1, 8):     1.0007, p-value:     0.3464

Degrees of Freedom: model 3, resid 8

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     0.3694     0.2132       1.73     0.1214    -0.0485     0.7872
---------------------------------End of Summary---------------------------------

Docstring:

PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None,
entity_effects = False, time_effects = False, x_effects = None,
cluster = None, dropped_dummies = None, verbose = False,
nw_overlap = False)

Implements panel OLS.

See ols function docs

This is another function (like fama_macbeth) where I believe the plan is to move this functionality to statsmodels.

python - Fixed effect in Pandas or Statsmodels

2 Answers