I am tring to use Random Forest Classifier from scikit learn in Python to predict stock movements. My dataset has 8 features and 1201 records. But after fitting the model and using it to predict, it appears 100% of accuracy and 100% of OOB error. I modified the n_estimators from 100 to a small value, but the OOB error has just dropped few %. Here is my code:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
#File reading
df = pd.read_csv('700.csv')
df.drop(df.columns[0],1,inplace=True)
target = df.iloc[:,8]
print(target)
#train test split
X_train, X_test, y_train, y_test = train_test_split(df, target, test_size=0.3)
#model fit
clf = RandomForestClassifier(n_estimators=100, criterion='gini',oob_score= True)
clf.fit(X_train,y_train)
pred = clf.predict(X_test)
accuaracy = accuracy_score(y_test,pred)
print(clf.oob_score_)
print(accuaracy)
How can I modifiy the code in order to make the oob error drop? Thanks.
oob_score_
is score, not error. The higher, the better. 100% accuracy and 100% oob_score seems fine to me. Are you sure you want to decrease that, or are talking about something else? – Vivek Kumar