1
votes

I have a lm object and I would like to bootstrap only its standard errors. In practice I want to use only part of the sample (with replacement) at each replication and get a distribution of standard erros. Then, if possible, I would like to display the summary of the original linear regression but with the bootstrapped standard errors and the corresponding p-values (in other words same beta coefficients but different standard errors).

Edited: In summary I want to "modify" my lm object by having the same beta coefficients of the original lm object that I ran on the original data, but having the bootstrapped standard errors (and associated t-stats and p-values) obtained by computing this lm regression several times on different subsamples (with replacement).

So my lm object looks like

    Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      3.812793   0.095282  40.016  < 2e-16 ***
x                                -0.904729   0.284243  -3.183  0.00147 ** 
z                                0.599258   0.009593  62.466  < 2e-16 ***
x*z                              0.091511   0.029704   3.081  0.00208 ** 

but the associated standard errors are wrong, and I would like to estimate them by replicating this linear regression 1000 times (replications) on different subsample (with replacement).

Is there a way to do this? can anyone help me?

Thank you for your time. Marco

1
Would you like to bootstrap the standard errors from the lm object or from the original data? Please edit the question with that important information. - Rui Barradas
I want to run the lm object 1000 times on different subsample each time (with replacement) in order to get new standard errors which are going to replace the ones (wrong) in the original lm object. - Marco Mello

1 Answers

1
votes

What you ask can be done following the line of the code below.
Since you have not posted an example dataset nor the model to fit, I will use the built in dataset mtcars an a simple formula with two continuous predictors.

library(boot)

boot_function <- function(data, indices, formula){
  d <- data[indices, ]
  obj <- lm(formula, d)
  coefs <- summary(obj)$coefficients
  coefs[, "Std. Error"]
}

set.seed(8527)

fmla <- as.formula("mpg ~ hp * cyl")
seboot <- boot(mtcars, boot_function, R = 1000, formula = fmla)

colMeans(seboot$t)
##[1] 6.511530646 0.068694001 1.000101450 0.008804784

I believe that it is possible to use the code above for most needs with numeric response and predictors.