I'm trying to do a bit of machine learning. I'm trying to predict engagement time for an article. I have my X dataset as follows:
_text_word_length _title_char_length _title_word_length _text_char_length
0 1306 53 7 8056
1 1075 62 11 6127
and my target Y values are just floats representing engagement time.
I use SciKit-Learn as follows:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score, KFold
import numpy as np
clf = RandomForestRegressor(n_jobs=-1, n_estimators=250, max_features = 0.8, verbose = 2)
score = cross_val_score(estimator = clf, X = X1, y = Y1, cv = KFold(n_splits = 5, random_state = 100), n_jobs = -1, \
scoring = "neg_mean_squared_error")
np.mean([np.sqrt(-x) for x in score])
Because I'm using verbose mode, it outputs all the trees for the random forest. It gets through almost all of the trees and then I get this:
JoblibException: JoblibException
___________________________________________________________________________
Multiprocessing exception:
Then there's a ton of text (won't reproduce it here but can upon request). At the very end, I see this:
ValueError: I/O operation on closed file
I'm totally lost because very similar code worked yesterday so I'm not sure what I'm doing incorrectly.
Any ideas?
Thanks!