2
votes

I'm trying to run a weighted squares regression, after creating my weights and trying to add it to my regression function I receive the following error:

Error in model.frame.default(formula = CO2_pc_cmice1 ~ GDP_pc_cmice1_C + :
variable lengths differ (found for '(weights)')

The lm model has 31 rows and the weights I've created are also 31, I've checked to see if there are NAs in either of these and there are not. There are some negative numbers, although I'd be surprised if this was the issue. I've run the formula using both na.action = na.omit and na.action = na.exclude

I'm also running this with a regression with a sample of 99 and I get the same issue.

My regression is

LinearCO2_lowerF <- (lm(CO2_pc_cmice1 ~ PolCiv_incPressFreedom_C + CorpInf_cmice1_C + 
                                        Gov_cmicepos1_C + LitGini_umice_C + 
                                        GDP_pc_cmice1_C + PopDensity_cmice1_C + 
                                        TradeOpen_cmice1_C + Urban_cmice1_C +
                                        poly(Oil_coal_umice_C,2), 
                                        data = mydata_completemice2, 
                                        subset = IncomeL == "L"))

Weights created

wtsco2low <- 1/fitted( lm(abs(residuals(LinearCO2_lowerF))~fitted(LinearCO2_lowerF)) )^2 

And the regression with weights

LinearCO2_lowerFw <- lm(CO2_pc_cmice1 ~ GDP_pc_cmice1_C + PolCiv_incPressFreedom_C +
                                        CorpInf_cmice1_C + Gov_cmicepos1_C + 
                                        LitGini_umice_C + PopDensity_cmice1_C +
                                        TradeOpen_cmice1_C + Urban_cmice1_C + 
                                        poly(Oil_coal_umice_C,2), 
                                        data = mydata_completemice2, 
                                        subset = IncomeL == "L",
                                        weights = wtsco2low, 
                                        na.action = na.omit)

(Have also tried with na.exlude)

Is anyone able to help?

1

1 Answers

3
votes

The subset= argument of R modelling functions is applied to all the arguments. So, it looks as though your weights vector is being subsetted. Since it was already the right length, you get an error.

Consider this example: the data frame has 30 rows, but only 20 are in the subset to be analysed, and I have only 20 weights. If I use the subset= argument, the weights get subsetted and there's an error.

Instead, you can use subset() on the data before passing it to lm(), and that works.

> d<-data.frame(y=rnorm(30),x=1:30)
> w<-rep(2,20)
> 
> lm(y~x,data=d, subset=x>10)

Call:
lm(formula = y ~ x, data = d, subset = x > 10)

Coefficients:
(Intercept)            x  
    -0.3161       0.0189  

> lm(y~x,data=d, subset=x>10, weights=w)
Error in model.frame.default(formula = y ~ x, data = d, subset = x > 10,  : 
  variable lengths differ (found for '(weights)')
> lm(y~x,data=subset(d, x>10),  weights=w)

Call:
lm(formula = y ~ x, data = subset(d, x > 10), weights = w)

Coefficients:
(Intercept)            x  
    -0.3161       0.0189  
```