I am trying out logistic regression on a data.frame (11359 rows, 137 columns). The data.frame contains Y (one dependent variable) and the predictors (136 independent variables). All the variables are binary.
The formula I created based on "my_data" data.frame is f = as.formula(paste('y ~', paste(colnames(my_data)[c(3:52, 54:133, 138:143)], collapse = '+')))
.
I applied glm, logistf and pmlr as follows
glm(f, family = binomial(link = "logit"), data = my_data)
logistf(f, my_data)
pmlr(f, data = my_data, method = "likelihood", joint = TRUE)
Glm function estimates some parameters but gives a Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
. I figured out that this message was generated due to separation issue so I tried logistf and pmlr functions.
With logistf, I didn't get any results after 50 hours without error, so I decided to terminate te process. (cpu usage 23-27%, ram usage approx. 1100mb during the first 10 hours, then 2-3mb).
For pmlr, I got this Error: cannot allocate vector of size 28.9 Gb
.
I tried logistf and pmlr based on 10 out of 137 variables to check if the problem is the number of the predictors and I got the same. Logistf was working "for ever" and pmlr gave same type of error with different size of vector (bigger than previous!!!!, if I recall correctly approx. 45 Gb).
Should I update my laptop's RAM to perform this calculation, find some other functions (if there are other packages for penalized logistic regression) or it's a different kind of problem e.g. lot of variables?
Windows 10 x64, Processor: i3-2.4GHz, Ram: 8.00Gb, R version: x64 3.4.0, Rstudio: 1.0.143.
speedglm
: cran.r-project.org/web/packages/speedglm/speedglm.pdf – Marco Sandri?glmnet
– user20650