7
votes

I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways:

lm(Y ~ A + B + factor(region) + factor(year), data = df)

or

library(plm)
plm(Y ~ A + B, 
    data = df, index = c('region', 'year'), model = 'within',
    effect = 'twoways')

In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE).

Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'. What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?

2
Results for variables A and B should be the same. The lm approach (LSDV) will give you estimates of the individual and time fixed effects and an intercept as well.Helix123
two ideas: in the lm command specify the formula as you have, but add a -1 to the end. As pointed out above, this will remove the intercept, which plm won't add automatically. The second point: rather than factor, have you tried as.factor?Moritz Schwarz

2 Answers

13
votes

Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use felm from the package lfe to do the same thing:

N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
                 region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)


model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
#  (Intercept)       -0.0522691  0.1422052   -0.368   0.7132    
#  a                  1.9982165  0.0101501  196.866   <2e-16 ***
#  b                 -1.4787359  0.0101666 -145.450   <2e-16 ***

library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))

model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)

# Coefficients :
#    Estimate Std. Error t-value  Pr(>|t|)    
# a  1.998217   0.010150  196.87 < 2.2e-16 ***
# b -1.478736   0.010167 -145.45 < 2.2e-16 ***

library(lfe)

model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# a  1.99822    0.01015   196.9   <2e-16 ***
# b -1.47874    0.01017  -145.4   <2e-16 ***
0
votes

This does not seem to be a data issue.

I'm doing computer exercises in R from Wooldridge (2012) Introductory Econometrics. Specifically Chapter 14 CE.1 (data is the rental file at: https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041)

I computed the model in differences (in Python)

model_diff = smf.ols(formula='diff_lrent ~ diff_lpop + diff_lavginc + diff_pctstu', data=rental).fit()

OLS Regression Results

==============================================================================
Dep. Variable:             diff_lrent   R-squared:                       0.322
Model:                            OLS   Adj. R-squared:                  0.288
Method:                 Least Squares   F-statistic:                     9.510
Date:                Sun, 05 Nov 2017   Prob (F-statistic):           3.14e-05
Time:                        00:46:55   Log-Likelihood:                 65.272
No. Observations:                  64   AIC:                            -122.5
Df Residuals:                      60   BIC:                            -113.9
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
================================================================================
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept        0.3855      0.037     10.469      0.000       0.312       0.459
diff_lpop        0.0722      0.088      0.818      0.417      -0.104       0.249
diff_lavginc     0.3100      0.066      4.663      0.000       0.177       0.443
diff_pctstu      0.0112      0.004      2.711      0.009       0.003       0.019
==============================================================================
Omnibus:                        2.653   Durbin-Watson:                   1.655
Prob(Omnibus):                  0.265   Jarque-Bera (JB):                2.335
Skew:                           0.467   Prob(JB):                        0.311
Kurtosis:                       2.934   Cond. No.                         23.0
==============================================================================

Now, the PLM package in R gives the same results for the first-difference models:

library(plm) modelfd <- plm(lrent~lpop + lavginc + pctstu, data=data,model = "fd")

No problem so far. However, the fixed effect reports different estimates.

modelfx <- plm(lrent~lpop + lavginc + pctstu, data=data, model = "within", effect="time") summary(modelfx)

The FE results should not be any different. In fact, the Computer Exercise question is:

(iv) Estimate the model by fixed effects to verify that you get identical estimates and standard errors to those in part (iii).

My best guest is that I am miss understanding something on the R package.