0
votes

So I am doing some parameter thing with RandomForest and GridsearchCV. Here is my code.

#Import 'GridSearchCV' and 'make_scorer'
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer

Create the parameters list you wish to tune
parameters = {'n_estimators':[5,10,15]}

#Initialize the classifier
clf = GridSearchCV(RandomForestClassifier(), parameters)

#Make an f1 scoring function using 'make_scorer' 
f1_scorer = make_scorer(f1_scorer)

#Perform grid search on the classifier using the f1_scorer as the scoring method
grid_obj = GridSearchCV(clf, param_grid=parameters, scoring=f1_scorer,cv=5)

print(clf.get_params().keys())

#Fit the grid search object to the training data and find the optimal parameters
grid_obj = grid_obj.fit(X_train_100,y_train_100)

So the issue is the following error: "ValueError: Invalid parameter max_features for estimator GridSearchCV. Check the list of available parameters with estimator.get_params().keys()."

I followed the advice given by the error and the output of print(clf.get_params().keys()) is below. However even when I copy and paste these titles into my parameter dictionary I still get an error. I've hunted around stack overflow and most people are using really similar parameter dictionaries to mine. Anyone have any idea on how to iron out this issue? Thanks again!

dict_keys(['pre_dispatch', 'cv', 'estimator__max_features', 'param_grid', 'refit', 'estimator__min_impurity_split', 'n_jobs', 'estimator__random_state', 'error_score', 'verbose', 'estimator__min_samples_split', 'estimator__n_jobs', 'fit_params', 'estimator__min_weight_fraction_leaf', 'scoring', 'estimator__warm_start', 'estimator__criterion', 'estimator__verbose', 'estimator__bootstrap', 'estimator__class_weight', 'estimator__oob_score', 'iid', 'estimator', 'estimator__max_depth', 'estimator__max_leaf_nodes', 'estimator__min_samples_leaf', 'estimator__n_estimators', 'return_train_score'])

1

1 Answers

2
votes

I think the problem is with the two lines:

clf = GridSearchCV(RandomForestClassifier(), parameters)
grid_obj = GridSearchCV(clf, param_grid=parameters, scoring=f1_scorer,cv=5)

What this is essentially doing is creating an object with a structure like:

grid_obj = GridSearchCV(GridSearchCV(RandomForestClassifier()))

which is probably one more GridSearchCV than you want.