I was wondering whether there is a way to post-process predicted labels in sklearn.
My training data has ground truth labels in the form
0, 1
However, the problem is I am currently using Isolation Forest, which predicts:
-1for outliers, equivalent to ground-truth label11for normal data, equivalent to ground-truth label0
If I were to write a function to post-process the prediction, it would be very simple:
def process_anomaly_labels(raw_y_pred):
y_pred = raw_y_pred.copy()
y_pred[raw_y_pred == 1] = 0
y_pred[raw_y_pred == -1] = 1
return y_pred
But I don't know how to postprocess prediction labels when I finetune the model using RandomSearchCV:
from sklearn.model_selection import RandomizedSearchCV
# fine tuning
forest_params = {
"n_estimators": [50, 200, 800],
"max_samples": [1000, 4000, 16000, 64000, 120000],
"max_features": [1, 5, 15, 30],
"contamination": [0.001, 0.1, 0.2, 0.5]
}
forest_grid_search = RandomizedSearchCV(
IsolationForest(),
param_distributions=forest_params,
scoring="f1",
n_jobs=8,
n_iter=50,
cv=3,
verbose=2
)
forest_grid_search.fit(X_train_trans, y_train)
I cannot convert the ground truth labels to match with predicted labels, because I would like to use binary F1 score when evaluating.
f1. - Vivek Kumar