100% accuracy with decision tree classifier using sklearn

Question

I'm using Decision tree classifier from sklearn, but I'm getting 100% percent score and I don't know what is wrong. I have tested svm and knn and both give 60% to 80% accuracy and seem ok. Here is my code:

    from sklearn.tree import DecisionTreeClassifier
    maxScore = 0
    index = 0
    Depths = [1, 5, 10, 20, 40]
    for i,d in enumerate(Depths):
        clf1 = DecisionTreeClassifier(max_depth=d)
        score = cross_val_score(clf1, X_train, Y_train, cv=10).mean()     
        index = i if(score > maxScore) else index
        maxScore = max(score, maxScore)
        print('The cross val score for Decision Tree classifier (max_depth=' + str(d) + ') is ' + 
        str(score))

    d = Depths[index]
    print()
    print("So the best value for max_depth parameter is " + str(d))
    print()

    # Classifying
    clf1 = DecisionTreeClassifier(max_depth=d)
    clf1.fit(X_train, Y_train)
    preds = clf1.predict(X_valid)
    print(" The accuracy obtained using Decision tree classifier is {0:.8f}%".format(100* 
    (clf1.score(X_valid, Y_valid))))

and here is the output: The cross val score for Decision Tree classifier (max_depth=1) is 1.0

The cross value score for Decision Tree classifier (max_depth=5) is 0.9996212121212121

The cross val score for Decision Tree classifier (max_depth=10) is 1.0

The cross val score for Decision Tree classifier (max_depth=20) is 1.0

The cross val score for Decision Tree classifier (max_depth=40) is 0.9996212121212121

So the best value for the max_depth parameter is 1

The accuracy obtained using Decision tree classifier is 100.00000000%

Harut Hunanyan Well, that was exactly the case. Thanks a lot. What Should I do to make the results better? — Baset Veisy
Well, it depends on your problem. First of all can you say why you consider it as a problem? I'm not sure, but I guess it's because you think that your model is overfitted. And as the problem mostly depend on data, all i can recomend is using another model, with different approach. — Harut Hunanyan
Thanks a lot. There actually was a feature in my matrix that fully described the target values. — Baset Veisy

Harut Hunanyan Harut Hunanyan · Accepted Answer · 2020-07-03T13:21:19

I think there's an obvious conclusion: your labels have high correlation with some of the features, or at least with one of them. Maybe your data isn't very good.

Anyway, you can check how a single feature split of your decision tree model affects on model prediction.

Use model.feature_importances_ property to see how 'important' a feature is for the model prediction.

Check the documentation Decision Tree Classifier.

If you still consider your model prediction isn't good enough, I recommend you to change your model, use model with different approach. At least if you have to work with decision trees, you can try Random Forest Classifier.

It is an ensemble model.The basic idea of ensemble learning is that the final model prediction is based on multiple weaker model predictions, weak learners. Check main approaches of making an ensemble models.

In the case of Random Forest Classifier, weak learner models are Decision Trees with small depth. And Decision Trees are making predictions using only a few number of features, and every time features are chosen randomly.Number of chosen features is a hyper-parameter, so it needs to be tuned.

Check the links and other tutorials for more information.

100% accuracy with decision tree classifier using sklearn

1 Answers