0
votes

I am trying to run a multinomial logit with year fixed effects in mlogit in Stata (panel data: year-country), but I do not get standard errors for some of the models. When I run the same model using multinom in R I get both coefficients and standard errors.

I do not use Stata frequently, so I may be missing something or I may be running different models in Stata and R and should not be comparing them in the first place. What may be happening?

So a few details about the simple version of the model of interest:

I created a data example to show what the problem is

  • Dependent variable (will call it DV1) with 3 categories of -1, 0, 1 (unordered and 0 as reference)

  • Independent variables: 2 continuous variables, 3 binary variables, interaction of 2 of the 3 binary variables

  • Years: 1995-2003

  • Number of observations in the model: 900

In R I run the code and get coefficients and standard errors as below.

R version of code creating data and running the model:

## Fabricate example data 
library(fabricatr)
data <- fabricate(
  N = 900,
  id = rep(1:900, 1),
  IV1 = draw_binary(0.5, N = N),
  IV2 = draw_binary(0.5, N = N),
  IV3 = draw_binary(0.5, N = N),
  IV4 = draw_normal_icc(mean = 3, N = N, clusters = id, ICC = 0.99),
  IV5 = draw_normal_icc(mean = 6, N = N, clusters = id, ICC = 0.99))

library(AlgDesign)
DV = gen.factorial(c(3), 1, center=TRUE, varNames=c("DV"))
year = gen.factorial(c(9), 1, center=TRUE, varNames=c("year"))

DV = do.call("rbind", replicate(300, DV, simplify = FALSE))
year = do.call("rbind", replicate(100, year, simplify = FALSE))

year[year==-4]= 1995
year[year==-3]= 1996
year[year==-2]= 1997
year[year==-1]= 1998
year[year==0]= 1999
year[year==1]= 2000
year[year==2]= 2001
year[year==3]= 2002
year[year==4]= 2003

data1=cbind(data, DV, year)
data1$DV1 = relevel(factor(data1$DV), ref = "0")

## Save data as csv file (to use in Stata)
library(foreign)
write.csv(data1, "datafile.csv", row.names=FALSE)

## Run multinom
library(nnet)
model1 <- multinom(DV1 ~ IV1 + IV2 + IV3 + IV4 + IV5 + IV1*IV2 + as.factor(year), data = data1)

Results from R

When I run the model using mlogit (without fixed effects) in Stata I get both coefficients and standard errors.

So I tried including year fixed effects in the model using Stata three different ways and none worked:

  1. femlogit
  • factor-variable and time-series operators not allowed
  • depvar and indepvars may not contain factor variables or time-series operators
  1. mlogit
  • fe option: fe not allowed
  • used i.year: omits certain variables and/or does not give me standard errors and only shows coefficients (example in code below)
* Read file
  import delimited using "datafile.csv", clear case(preserve)
  
* Run regression
    mlogit DV1 IV1 IV2 IV3 IV4 IV5 IV1##IV2 i.year, base(0) iterate(1000)

Stata results

  1. xtmlogit
  • error - does not run
  • error message: total number of permutations is 2,389,461,218; this many permutations require a considerable amount of memory and can result in long run times; use option force to proceed anyway, or consider using option rsample()
1
These questions are difficult to ask and difficult to answer, really presuming here some reader with very detailed understanding of the functionality of both Stata and R in this territory (not me). Generic advice: Make your question self-contained with a minimal data example and code that is sufficient to see the problem. I doubt that the minimal example would need six predictors. Also, don't ask several questions at once: your last paragraph slips in something different.Nick Cox
Thanks for the comment, I edited to include a data example to add some clarity in my question.Emily
Your mlogit Stata results do not converge, possibly because there is no relation between $y$ and $X$ in your random data. With your observed data, multinom and femlogit may give different results because they estimate different models (and may have different convergence criteria). As far as I know, femlogit (and now xtmlogit) are the only fixed-effect multinomial logit commands/functions in public Stata, R, and Python packages. I think femlogit will converge (if convergence is possible) if you follow @Danferno's answer.Richard Herron

1 Answers

1
votes

Fixed effects and non-linear models (such as logits) are an awkward combination. In a linear model you can simply add dummies/demean to get rid of a group-specific intercept, but in a non-linear model none of that works. I mean you could do it technically (which I think is what the R code is doing) but conceptually it is very unclear what that actually does.

Econometricians have spent a lot of time working on this, which has led to some work-arounds, usually referred to as conditional logit. IIRC this is what's implemented in femlogit. I think the mistake in your code is that you tried to include the fixed effects through a dummy specification (i.year). Instead, you should xtset your data and then run femlogit without the dummies.

xtset year
femlogit DV1 IV1 IV2 IV3 IV4 IV5 IV1##IV2

Note that these conditional logit models can be very slow. Personally, I'm more a fan of running two one-vs-all linear regressions (1=1 and 0/-1 set to zero, then -1=1 and 0/1 set to zero). However, opinions are divided (Wooldridge appears to be a fan too, many others very much not so).