1
votes

My question is, how do I include industry and year fixed effects in plm, when I have multiple firms in same industry in same year? Repex of my data looks like this:

Year    Industry    CompanyID   CEOID   CEO.background  MBA.CEO CEO.Tenure  Female.CEO  CEO.age Capex       Log.TA      Leverage
2005    6           1075        10739   0               0       6.92        0           55      0.08623238  9.199961396 0.330732917
2006    6           1075        10739   0               0       7.92        0           56      0.097455145 9.334559982 0.26575725
2007    6           1075        10739   0               0       8.92        0           56      0.113033772 9.346263914 0.285439531
2008    6           1075        10739   0               0       9.92        0           57      0.108640177 9.327564318 0.322985772
2009    6           1075        5835    0               0       0.67        0           54      0.08526524  9.360491034 0.333880116
2010    6           1075        5835    0               0       1.67        0           55      0.081452292 9.376545673 0.32197511
2005    6           1743        8379    0               0       17.43       0           65      0.236487293 6.693007633 0.021915227
2006    6           1743        26012   0               1       0.91        0           59      0.319264835 6.820455133 0.023157959
2007    6           1743        26012   0               1       1.91        0           58      0.207384938 6.844512984 0.020087012
2008    6           1743        26012   0               1       2.92        0           59      0.130632264 6.890964093 0.017103795
2009    6           1743        26012   0               1       3.92        0           60      0.112029325 6.879662342 0.017283796
2010    6           1743        30801   0               0       1           1           47      0.02804693  6.767971236 0.044755539
2005    7           1004        9249    0               0       9.65        0           53      0.076370794 6.596094672 0.31534354
2006    7           1004        9249    0               0       10.65       0           54      0.114891589 6.886346743 0.327808308
2007    7           1004        9249    0               0       11.65       0           55      0.097727719 6.973199328 0.307086799
2008    7           1004        9249    0               0       12.65       0           56      0.112119583 7.216716829 0.389800369
2009    7           1004        9249    0               0       13.65       0           57      0.086281135 7.228033526 0.331455792
2010    7           1004        9249    0               0       14.65       0           58      0.298922358 7.313914813 0.291147083

CEO.background, MBA.CEO, and Female.CEO are time-invariant dummies for each CEO and industry time-invariant dummy for firm, while rest are time varying firm/CEO attributes.

I would like to run the following fixed effects for industry/year regression code:

plm(Capex ~ CEO.background + MBA.CEO + CEO.Tenure + Female.CEO + CEO.age + Log.TA + Leverage, data=repexcapex, index = (c("Industry", "Year")), model = "within", effect = "twoways")

However, if I have multiple companies in same industry like above data (company ID 1075/1743 both in industry 6), the code gives an error about duplicates.

Error in pdim.default(index[[1]], index[[2]]) : 
  duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
[...]

If I kill the first 5 rows and run it with just 1 firm per industry, the code works.

How should I formulate my regression to be able to include both industry and year fixed effects? Is running the code with industry dummies like below equivalent to industry fixed effects:

plm(Capex ~ CEO.background + MBA.CEO + CEO.Tenure + Female.CEO + CEO.age + Log.TA + Leverage + factor(Industries), data=repexcapex, index = (c("Year")), model = "within", effect = "individual")

this is the formatted data:

repexcapex <- read.table(text="
Year,Industry,CompanyID,CEOID,CEO.background,MBA.CEO,CEO.Tenure,Female.CEO,CEO.age,Capex,Log.TA,Leverage
2005,6,1075,10739,0,0,6.92,0,55,0.08623238,9.199961396,0.330732917
2006,6,1075,10739,0,0,7.92,0,56,0.097455145,9.334559982,0.26575725
2007,6,1075,10739,0,0,8.92,0,56,0.113033772,9.346263914,0.285439531
2008,6,1075,10739,0,0,9.92,0,57,0.108640177,9.327564318,0.322985772
2009,6,1075,5835,0,0,0.67,0,54,0.08526524,9.360491034,0.333880116
2010,6,1075,5835,0,0,1.67,0,55,0.081452292,9.376545673,0.32197511
2005,6,1743,8379,0,0,17.43,0,65,0.236487293,6.693007633,0.021915227
2006,6,1743,26012,0,1,0.91,0,59,0.319264835,6.820455133,0.023157959
2007,6,1743,26012,0,1,1.91,0,58,0.207384938,6.844512984,0.020087012
2008,6,1743,26012,0,1,2.92,0,59,0.130632264,6.890964093,0.017103795
2009,6,1743,26012,0,1,3.92,0,60,0.112029325,6.879662342,0.017283796
2010,6,1743,30801,0,0,1,1,47,0.02804693,6.767971236,0.044755539
2005,7,1004,9249,0,0,9.65,0,53,0.076370794,6.596094672,0.31534354
2006,7,1004,9249,0,0,10.65,0,54,0.114891589,6.886346743,0.327808308
2007,7,1004,9249,0,0,11.65,0,55,0.097727719,6.973199328,0.307086799
2008,7,1004,9249,0,0,12.65,0,56,0.112119583,7.216716829,0.389800369
2009,7,1004,9249,0,0,13.65,0,57,0.086281135,7.228033526,0.331455792
2010,7,1004,9249,0,0,14.65,0,58,0.298922358,7.313914813,0.291147083",
sep=",",header=TRUE)
1
Hi, welcome to stack overflow. Consider looking up how to make a reproducible example, especially sharing a small subset of data that is sufficient to replicate the issue.Mark Neal
Hey Mark, Thank you for the comment! I have included repex of the problem now. Let me know if this data formatting doesn't work.PSL
It seems that the plm package changed a bit. So if you want to use it I would recommend this doc. and thiis if you want to estimate your model without the plm package. I would still recommend to learn the plm package though but I must admit I'm a bit rusty myself on it,DJJ
A statistical hint: For having CEO age and CEO tenure in the same regression of a within model, you might want to look at example 2 in ?plm::detect.lindep where it is assumed a CEO stays with the firm for the whole time.Helix123
@DJJ DJJ: Thank you the documents, I'll read through them and see if I can understand how to do this better based on them.PSL

1 Answers

0
votes

As your dependent variable Capex seems to be a company-specific measure, likely the unit of observation (= what plm calls the individual dimension) is company (variable CompanyID) which is to be specified in the index argument.

Thus, a basic 2-way model can be estimated by:

plm(Capex ~ CEO.background + MBA.CEO + CEO.Tenure + Female.CEO + CEO.age + Log.TA + Leverage, data=repexcapex, index = (c("CompanyID", "Year")), model = "within", effect = "twoways")

To add industry fixed effects, include +factor(Industry) in the formula. Likely, this variable will drop out of the estimation as it is correlated with the other fixed effects (it is for the small sample data you provided).