My aim to to obtain a linear regression model of a dataset and its associated residuals after removing outliers.
Using the 'iris' data set to illustrate:
This original model with no observations removed
(MODEL1)
library(dplyr)
library(magrittr)
library(broom)
iris %>%
+ do(tidy(lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, .)))
term estimate std.error statistic p.value
1 (Intercept) 2.3903891 0.26226815 9.114294 5.942826e-16
2 Sepal.Width 0.4322172 0.08138982 5.310458 4.025982e-07
3 Petal.Length 0.7756295 0.06424566 12.072869 1.151112e-23
4 Speciesversicolor -0.9558123 0.21519853 -4.441537 1.759999e-05
5 Speciesvirginica -1.3940979 0.28566053 -4.880261 2.759618e-06
But I want to remodel with some outliers (based on .cooksd) removed. Ie:
(MODEL2)
iris %>%
+ do(augment(lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, .))) %>%
+ filter(.cooksd < 0.03) %>%
+ do(tidy(lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, .)))
term estimate std.error statistic p.value
1 (Intercept) 2.3927287 0.23718040 10.088223 2.875549e-18
2 Sepal.Width 0.4150542 0.07374143 5.628508 9.775805e-08
3 Petal.Length 0.8035635 0.05975821 13.446914 7.229176e-27
4 Speciesversicolor -0.9858935 0.19651867 -5.016793 1.589618e-06
5 Speciesvirginica -1.4841365 0.26399083 -5.621924 1.008374e-07
Saving these models:
lm_model2 <- iris %>%
do(augment(lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, .))) %>%
filter(.cooksd < 0.03) %>%
lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, .)
lm_model1 <- iris %>%
lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, .)
Having done that, is it possible to obtain the regression residuals of the dataset based on the second model.
The only solution I can think of is to use the co-efficients of model 2 to calculate these indirectly ie:
Residual = 2.3927287 + 0.4150542 * Sepal.Width + 0.8035635 * Petal.Length + [-0.9858935 * Speciesversicolor] or + [-1.4841365 * Speciesvirginica] - Sepal.Length
Is there a better way? Something similar to:
residuals <- obtain_residuals(iris, lm_model2)
Many thanks.
lm
model object? – Hong Ooilm_model1
andlm_model2
are model summaries as dataframes and not aligned to original dataset (N=150). – Parfait