I use scikit-learn in Python to run RandomForestClassifier(). Because I want to visualize Random Forests to realize the correlation between different features, I use export_graphviz() to achieve this goal.
estimator1 = best_model1.estimators_[0]
from sklearn.tree import export_graphviz
export_graphviz(estimator1,
'tree_from_optimized_forest.dot',
rounded = True,
feature_names=X_train.columns,
class_names = ["No", "Yes"],
filled = True)
from subprocess import call
call(['dot', '-Tpng', 'tree_from_optimized_forest.dot', '-o', 'tree_from_optimized_forest.png', '-Gdpi=200'])
from IPython.display import Image
Image('tree_from_optimized_forest.png', "w")
However, unlike Decision Tree, Random Forests will produce many trees, which are depended on the number of n_estimators in RandomForestClassifier().
best_model1 = RandomForestClassifier(n_estimators= 100,
criterion='gini',
random_state= 42,
)
Besides, because DecisionTreeClassifier() uses all the samples to produce just one tree, we can explain directly the results on this single tree.
In opposite, Random Forests is trained to make several different trees, then voting inside these trees to decide the result. In addition, the content of these trees are different because Random Forests has the methods of Bootstrap, Bagging, Out-of-bag...and so on.
Therefore, I want to ask that if I only visualize one of trees from the result of RandomForestClassifier(), whether this tree has a certain reference value?
Can I directly explain the content of this tree as the analysis result of whole data? if not, whether DecisionTreeClassifier() is the only way to analyze the correlation between features through visualized image?
Thanks a lot!!