0
votes

I'm trying to understand why my code has taken several days to process and how I can improve the next iteration. I'm on my third day and continue to have outputs with marginal improvements in AIC. The last couple of AIC's have been 18135.38, 18187.43, and 18243.13. I currently have 33 covariates in the model. The "none" option is 12th from the bottom, so there are still many covariates to run.

The data is ~610K observations with ~1600 variables. Outcome variables and covariates are mostly binary. My covariates were chosen after doing univariate logistical regression and P-value adjustment using Holm procedure (alpha=0.05). No interaction terms are included.

The code I've written is here:

intercept_only <- glm(outcome ~ 1, data=data, family="binomial")
full.model <- glm(outcome ~ 157 covariates, data=data, family = "binomial")
forward_step_model <- step(intercept_only, direction = "forward", scope = formula(full.model))

I'm hoping to run the same code on a different outcome variable with double the number of covariates identified in the same way as above but am worried it will take even longer to process. I see there are both the step and stepAIC functions to perform stepwise regression. Is there an appreciable difference between these functions? Are any other ways of doing this? Is there any way to speed up the processing?

Why not ridge/lasso/elasticnet with glmnet? It will be much faster and probably work better too ... - Ben Bolker
The thinking was that stepwise would be more intuitive for explaining what we're doing. Also, I was told that ridge/lasso/elasticnet are uncommon ways of doing things in the field, so I didn't want to raise too many eyebrows when publishing comes around - thou
since stepwise is inferior to many other approaches I don't know how much effort people will have put into making it efficient. You could try cran.r-project.org/web/packages/bigstep/vignettes/bigstep.html ? - Ben Bolker
Thank you, Ben! I found some time to chat with some faculty, and there was more willingness to move towards lasso. I'll post a question if I find that challenging! I'll take a look at bigstep as well. - thou
On a side note, is there a stack overflow way of dealing with this type of question? (e.g. not a good question/without a clear answer) - thou