0
votes

I am doing an analysis in Stata of the determinants of census tract unemployment rates. Some of the previous literature on my topic has used straight OLS regression, and I started with this type of analysis, but it seems to me after my own further reading that a Generalized Linear Model is better. This is especially because I am interested in presenting predicted values for the census tracts' unemployment rates based on my regression and I would like these to be appropriately bounded (between 0% and 100% inclusive). My unemployment rates include 0s for some census tracts so I would need to take this into account.

My questions are:

  1. whether Stata's fracreg logit is equivalent to the program's glm with a logit link and binomial family? (I have read about using the glm version in a few places including here but see that fracreg is a new-ish command which seems to serve the same purpose). Can I specify an equivalent to the robust option when using fracreg logit?

  2. if using fracreg, on what basis should I decide to use a fractional probit (fracreg probit) or fractional logit (fracreg logit) regression?

  3. a simply (probably ignorant) question of interpretation: I see that the fracreg and glm regressions mentioned above don't report an R-squared value. Is there an equivalent measure for these regressions I can calculate? My OLS R-squared values have been reasonably high and this has been a point of reassurance for me, so I'd like to see how these models compare (though I know R-squared isn't everything!).

  4. if using these models are there any additional restrictions or assumptions (such as additional assumptions beyond the BLUE of OLS) that I should keep in mind? With my OLS regressions I have taken the natural log of unemployment rates (makes my residuals more normal, higher R-squared, and convenient interpretation). Could I do the same with the fracreg or glm regressions above?

It's been a while since I formally studied limited dependent variables so please excuse my ignorance on these issues.

I have cross-posted this question at Statalist here.

1
I'm voting to close this question as off-topic because there is no programming issue. It's about which command to use for which statistical purpose and various statistical questions. Statalist is right for this, in my view. Cross Validated is not an especially good fit because so much here is software-specific.Nick Cox

1 Answers

0
votes

This isn't Stata-specific, but check out Paolino's 2001 "Maximum Likelihood Estimation of Models with Beta-Distributed Dependent Variables;" at a minimum will highlight a lit review for why OLS offers biased estimators.

Hey, follow-up: Someone did make a Stata solution, check out "Buckley, Jack. 2003. "Estimation of Models with Beta-Distributed Dependent Variables: A Replication and Extension of Paolino's Study." Political Analysis. 11(2): 204-205."