1
votes

I'm trying to do a bit of machine learning. I'm trying to predict engagement time for an article. I have my X dataset as follows:

    _text_word_length   _title_char_length  _title_word_length  _text_char_length
0   1306                53                   7                  8056    
1   1075                62                   11                 6127

and my target Y values are just floats representing engagement time.

I use SciKit-Learn as follows:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score, KFold
import numpy as np
clf = RandomForestRegressor(n_jobs=-1, n_estimators=250, max_features = 0.8, verbose = 2)
score = cross_val_score(estimator = clf, X = X1, y = Y1, cv = KFold(n_splits = 5, random_state = 100), n_jobs = -1, \
                        scoring = "neg_mean_squared_error")
np.mean([np.sqrt(-x) for x in score])

Because I'm using verbose mode, it outputs all the trees for the random forest. It gets through almost all of the trees and then I get this:

JoblibException: JoblibException
___________________________________________________________________________
Multiprocessing exception:

Then there's a ton of text (won't reproduce it here but can upon request). At the very end, I see this:

ValueError: I/O operation on closed file

I'm totally lost because very similar code worked yesterday so I'm not sure what I'm doing incorrectly.

Any ideas?

Thanks!

1

1 Answers

2
votes

Can you try something like the following? So put all your code after if name =='main' with the appropriate indent .

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score, KFold
import numpy as np

if __name__ =='main':

    clf = RandomForestRegressor(n_estimators=250, max_features = 0.8, verbose = 2)
    score = cross_val_score(estimator = clf, X = X1, y = Y1, cv = KFold(n_splits = 5, random_state = 100), n_jobs = -1,scoring = "neg_mean_squared_error")
    np.mean([np.sqrt(-x) for x in score])

Note that 1) in cross_val_score before scoring=.. there should be a coma and not a slice (/)

2) use only once n_jobs =-1 inside cross_val_score