4
votes

I am currently building a regression model which helps explain sales using certain factors like income,temperature etc.On checking the residual plot after regression , the residuals are heteroscedastic.

To account for heteroscedasticity , I have made use of vcovHC() and coeftest( ) in R which can be used to re-calculate the standard errors along with their p-values under the assumption of Heteroscedasticity. But these functions return NA values and hence all corresponding p-values are also NAs. What could be the reason for this issue and how do I resolve it?The code it as below:

fit_p<-lm(formula = final_list_p, data = new_data_p)
s_p<-summary(fit_p)

The output

The summary statistics for the output of linear regression is:

Residuals:                  
Min      1Q  Median      3Q     Max                     
-244190  -60770   -5759   59730  311108         


Coefficients:                   
Estimate                    
(Intercept)                 
        Std. Error  t   value   Pr(>|t|)    
var1    -3.36E+05   1.77E+05    -1.893  0.059026    .
var2    -2.90E+04   4.96E+03    -5.86   8.97E-09    ***
var3    -1.75E+05   8.93E+04    -1.958  0.050834    .
var4    -4.62E+00   2.80E+00    -1.653  0.098975    .
var5    2.39E+01    7.85E+00    3.04    0.002503    **
var6    -6.32E+04   1.08E+05    -0.588  0.556682    
var7    -5.38E+03   3.69E+04    -0.146  0.884204    
var8    6.03E+04    6.53E+04    0.923   0.356275    
var9    3.33E-01    4.75E-02    7.011   8.76E-12    ***
var10   -7.94E+04   2.33E+05    -0.34   0.73381 
var11   1.06E+05    1.08E+05    0.986   0.324424    
var12   -1.06E+04   4.41E+03    -2.39   0.017275    *
var14   5.44E+03    8.80E+02    6.182   1.43E-09    ***
var16   9.12E+04    7.34E+04    1.242   0.21481 
var18   1.78E+04    8.41E+04    0.211   0.832674    
var19   -1.75E+05   1.18E+05    -1.487  0.137787    
var20   4.19E+03    6.95E+02    6.023   3.58E-09    ***
var25   2.96E+00    4.82E-01    6.146   1.76E-09    ***


Residual standard error: 87850 on 447 degrees of freedom                
Multiple R-squared:  0.6144,    Adjusted R-squared:0.5958               
F-statistic: 39.57 on 18 and 447 DF,  p-value: <2.2e-16                 

When I check the residual plot,they are heteroskedastic.To account for this issue , the standard errors are recalculated using robust standard errors (vcovHC::sandwich) The results after performing the coeftest::lmtest is as follows:

s_p$coefficients <- unclass(coeftest(fit_p, vcov. = vcovHC))
        Estimate    Std.Error t-value Pr(>|t|)
Intercept-3.36E+05  NA          NA    NA
var1    -2.90E+04   NA          NA    NA
var2    -1.75E+05   NA          NA    NA
var3    -4.62E+00   NA          NA    NA
var4    2.39E+01    NA          NA    NA
var5    -6.32E+04   NA          NA    NA
var6    -5.38E+03   NA          NA    NA
var7    6.03E+04    NA          NA    NA
var8    3.33E-01    NA          NA    NA
var9    -7.94E+04   NA          NA    NA
var10   1.06E+05    NA          NA    NA
var11   -1.06E+04   NA          NA    NA
var12   5.44E+03    NA          NA    NA
var14   9.12E+04    NA          NA    NA
var16   1.78E+04    NA          NA    NA
var18   -1.75E+05   NA          NA    NA
var19   4.19E+03    NA          NA    NA
var20   2.96E+00    NA          NA    NA
var25   3.29E+03    NA          NA    NA
2
We need to look at some code... maybe you alread have NA values in your estimated lm-model?Helix123
I checked the data points of my dataset and they do not have any NA values.The beta estimates of the coefficients are being estimated but vcovHC( lm object) returns NA valuesAsha
Could you provide a reproducible example, so we can have a closer look at it?horseoftheyear

2 Answers

2
votes

You should specificate a type of vcovHC. For example:

coeftest(fit_p,vcov.=vcovHC(fit_p,type="HC1"))

This particular variant gives same results as in Eviews.

1
votes

There is a try why the coeftest is not working with a robust vcov: If your model.matrix contrains very large values as well as very small values, the algorithm might not be able to do the computation numerically stable. Thus, have a look at model.matrix(formula , data=new_data_p) if this is the case. If so, try rescaling some variables in your (p)data.frame before model estimation (e.g. multiply oder divide by 100 oder 1000 [also log() makes sense sometimes). Take care, the interpretation of the coefficients changes due to the change of scales!