0
votes

I'm interested into apply a Jackknife analysis to in order to quantify the uncertainty of my coefficients estimated by the logistic regression. I´m using a glm(family=’binomial’) because my independent variable is in 0 - 1 format.

My dataset has 76000 obs, and I'm using 7 independent variables plus an offset. The idea involves to split the data in let’s say 5 random subsets and then obtaining the 7 estimated parameters by dropping one subset at a time from the dataset. Then I can estimate uncertainty of the parameters.

I understand the procedure but I'm unable to do it in R.

This is the model that I'm fitting:

glm(f_ocur ~ altitud + UTM_X + UTM_Y + j_sin + j_cos + temp_res + pp +
             offset(log(1/off)), data = mydata, family = 'binomial')

Does anyone have an idea of how can I make this possible?

1

1 Answers

1
votes

Jackknifing a logistic regression model is incredibly inefficient. But an easy time intensive approach would be like this:

Formula <- f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp+offset(log(1/off))
coefs <- sapply(1:nrow(mydata), function(i)
  coef(glm(Formula, data=mydata[-i, ], family='binomial'))
)

This is your matrix of leave-one-out coefficient estimates. The covariance matrix of this matrix estimates the covariance matrix of the parameter estimates.

A significant time improvement could be had by using glm's workhorse function, glm.fit. You can go even farther by linearizing the model (use one-step estimation, limit niter in the Newton Raphson algorithm to one iteration only, using Jackknife SEs for the one-step estimators are still robust, unbiased, the whole bit...)