0
votes

I training a random forest model to predict title cluster. The issue is running in notebook, the predicted cluster is correct. But when uploading random forest model to the flask, the predicted becomes same for all input. Would you like to give some suggestions? Thanks.

feature_dim = 2 ** 10
vectorizer = TfidfVectorizer(max_features=feature_dim)
vectorizer.fit_transform(df['text'].values)


text = df['text'].values
X = vectorizer.fit_transform(text)   

rf_model = RandomForestClassifier(n_estimators=100)
rf_model.fit(X1_train, y1_train)

pickle.dump(rf_model, open('rf_model.sav', 'wb'))

rf_model = load('rf_model.sav')

titles = [
    "title_1"
    "title_2",
]

X_ti = vectorizer.transform(titles)
y_rf = rf_model.predict(X_ti)
print(y_rf)

Results look like: [8 8 8 8 8 8 8]

Is it caused by not dumping tfidf vector feature?

1

1 Answers

0
votes

The problem is resolved by dumping the vectorizer as well.