0
votes

I would like to run a fixed-effects model using OLS with weighted data.

Since there can be some confusion, I mean to say that I used "fixed effects" here in the sense that economists usually imply, i.e. a "within model", or in other words individual-specific effects. What I actually have is "multilevel" data, i.e. observations of individuals, and I would like to control for their region of origin (and have corresponding clustered standard errors).

Sample data:

library(multilevel)
data(bhr2000)
weight <- runif(length(bhr2000$GRP),min=1,max=10)
bhr2000 <- data.frame(bhr2000,weight)
head(bhr2000)
  GRP AF06 AF07 AP12 AP17 AP33 AP34 AS14 AS15 AS16 AS17 AS28 HRS RELIG   weight
1   1    2    2    2    4    3    3    3    3    5    5    3  12     2 6.647987
2   1    3    3    3    1    4    3    3    4    3    3    3  11     1 6.851675
3   1    4    4    4    4    3    4    4    4    2    3    4  12     3 8.202567
4   1    3    4    4    4    3    3    3    3    3    3    4   9     3 1.872407
5   1    3    4    4    4    4    4    3    4    2    4    4   9     3 4.526455
6   1    3    3    3    3    4    4    3    3    3    3    4   8     1 8.236978

The kind of model I would like to estimate is:

AF06_ij = beta_0 + beta_1 AP34_ij + alpha_1 * (GRP == 1) + alpha_2 * (GRP==2) +... + e_ij

where i refer to specific indidividuals and j refer to the group they belong to.

Moreover, I would like observations to be weighted by weight (sampling weights).

However, I would like to get "clustered standard errors", to reflect possible GRP-specific heteroskedasticity. In other words, E(e_ij)=0 but Var(e_ij)=sigma_j^2 where the sigma_j can be different for each GRP j.

If I understood correctly, nlme and lme4 can only estimate random-effects models (or so-called mixed models), but not fixed-effects model in the sense of within.

I tried the package plm, which looked ideal for what I wanted to do, but it does not allow for weights. Any other idea?

2
Questions with no data, no concrete problem description, and requesting both a recommendation for an alternate package and a worked example are really over the line as "too vague" at least for SO. You should get your general statistical advice in one of the venues that solicit such questions. - IRTFM
You're right. I amended my question to make it clearer what I would like to do. Thanks! - Peutch
Here is some interesting reading on fixed/random effects. andrewgelman.com/2005/01/25/why_i_dont_use - miles2know
Thank you, that's why I used quotes around the expression, it's confusing as hell! I also edited my question to put the specific econometric model I am trying to estimate so that it is clearer... - Peutch

2 Answers

2
votes

I think this is more of a stack exchange question, but aside from fixed effects with model weights; you shouldn't be using OLS for an ordered categorical response variable. This is an ordered logistic modeling type of analysis. So below I use the data you have provided to fit one of those.

Just to be clear we have an ordered categorical response "AF06" and two predictors. The first one "AP34" is also an ordered categorical variable; the second one "GRP" is your fixed effect. So generally you can create a group fixed effect by coercing the variable in question to a factor on the RHS...(I'm really trying to stay away from statistical theory because this isn't the place for it. So I might be inaccurate in some of the things I'm saying)

The code below fits an ordered logistic model using the polr (proportional odds logistic regression) function. I've tried to interpret what you were going for in terms of model specification, but at the end of the day OLS is not the right way forward. The call to coefplot will have a very crowded y axis I just wanted to present a very rudimentary start at how you might interpret this. I'd try to visualize this in a more refined way for sure. And back to interpretation...You will need to work on that, but I think this is generally the right method. The best resource I can think of is chapters 5 and 6 of "Data Analysis Using Regression and Multilevel/Hierarchical Models" by Gelman and Hill. It's such a good resource so I'd really recommend you read the whole thing and try to master it if you're interested in this type of analysis going forward.


    library(multilevel) # To get the data
    library(MASS) # To get the polr modeling function
    library(arm) # To get the tools, insight and expertise of Andrew Gelman and his team

    # The data
    weight <- runif(length(bhr2000$GRP),min=1,max=10)
    bhr2000 <- data.frame(bhr2000,weight)
    head(bhr2000)

    # The model
    m <- polr(factor(AF06) ~ AP34 + factor(GRP),weights = weight, data = bhr2000, Hess=TRUE,  method = "logistic")
    summary(m)
    coefplot(m,cex.var=.6) # from the arm package
1
votes

Check out the lfe package---it does econ style fixed effects and you can specify clustering.