0
votes

I'm fitting a multiple linear regression model with 6 predictiors (3 continuous and 3 categorical). The residuals vs. fitted plot show that there is heteroscedasticity, also it's confirmed by bptest().

summary of sales_lm

rediduals vs. fitted plot

Also I calculated the sqrt for my train data and test data, as showed below:

sqrt(mean(sales_train_lm_pred-sales_train$SALES)^2)

2 3533.665

 sqrt(mean(sales_test_lm_pred-sales_test$SALES)^2)

2 3556.036

I tried to fit glm() model, but still didn't rectify heteroscedasticity.

 glm.test3<-glm(SALES~.,weights=1/sales_fitted$.resid^2,family=gaussian(link="identity"), data=sales_train)

resid vs. fitted plot for glm.test3 it looks weird. glm.test3 plot

Could you please help me what should I do next?

Thanks in advance!

1
The first suggestion is to transform your data in log scale and perform the same analyses. The second is to take the first-order difference and perform the same analyses.Vitali Avagyan

1 Answers

0
votes

That you observe heteroscedasticity for your data means that the variance is not stationary. You can try the following:

1) Apply the one-parameter Box-Cox transformation (of the which the log transform is a special case) with a suitable lambda to one or more variables in the data set. The optimal lambda can be determined by looking at its log-likelihood function. Take a look at MASS::boxcox.

2) Play with your feature set (decrease, increase, add new variables).

2) Use the weighted linear regression method.