2
votes

I'm trying to train a huge dataset with sklearn's logistic regression. I've set the parameter n_jobs=-1 (also have tried n_jobs = 5, 10, ...), but when I open htop, I can see that it still uses only one core.

Does it mean that logistic regression just ignores the n_jobs parameter?

How can I fix this? I really need this process to become parallelized...

P.S. I am using sklearn 0.17.1

2

2 Answers

1
votes

the parallel process backend also depends on the solver method. if you want to utilize multi core, the multiprocessing backend is needed.

but solver like 'sag' can only use threading backend.

and also mostly, it can be blocked due to a lot of pre-processing.

0
votes

There are multiple reasons for it if you read their instructions carefully. Can you try this set of parameters:

    logit = LogisticRegression(penalty='l2',
                       random_state=42,
                       C=0.2,
                       n_jobs=-1,
                       solver='sag',
                       multi_class='ovr',
                       max_iter=200,
                       verbose=10
                      )

And it takes one minute or two to start multiple threads.