Decision trees by labels

Question

I need to do a decision tree and represent the data by labels on a graph (as showed in the two illustrations). I have no problem for the decision tree, unfortunately, the dots are not imput in the graph. I have tried several changes in the code. The code come from the scikit learn website Plot the decision surface of a decision tree on the iris dataset

There is an example of the data used (X,Y, C5) on below (from an excel file) :

Path = "Documents/Apprentissage/Python/Script/ClustAllRepres12.xlsx"
Wordbook = xlrd.open_workbook(Path)
Sheet = Wordbook.sheet_by_index(0)

X=[]
Y=[]
C5=[]

for i in range(1, Sheet.nrows):
    X.append(Sheet.cell_value(i, 0))
    Y.append(Sheet.cell_value(i, 1))
    C5.append(Sheet.cell_value(i, 8))    

X

Out[]: [8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 11.0, 12.0, 12.0, 12.0, 12.0, 12.0, 13.0, 13.0, 13.0, 14.0, 14.0, 14.0, 15.0, 15.0, 15.0, 15.0, 16.0, 16.0, 16.0, 16.0, 17.0, 17.0, 17.0, 17.0, 18.0, 18.0, 18.0, 18.0, 18.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 20.0, 21.0, 21.0, 22.0]

type(X)

Out[]: list

Out[]: [45.0, 17.0, 18.0, 24.0, 25.0, 27.0, 36.0, 38.0, 39.0, 24.0, 37.0, 40.0, 24.0, 31.0, 35.0, 36.0, 37.0, 39.0, 32.0, 33.0, 35.0, 43.0, 27.0, 31.0, 35.0, 42.0, 18.0, 39.0, 43.0, 31.0, 42.0, 28.0, 32.0, 35.0, 43.0, 51.0, 52.0, 17.0, 19.0, 53.0, 49.0, 51.0, 53.0, 58.0, 16.0, 58.0, 59.0, 50.0, 52.0, 54.0]

type(Y)

Out[]: list

C5

Out[]: [2.0, 4.0, 3.0, 3.0, 3.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 4.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 5.0, 1.0, 3.0, 1.0, 1.0, 4.0, 4.0, 4.0, 1.0, 5.0, 1.0, 5.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0]

type(C5)

Out[]: list

There is the code from the scikit learn website:

import pandas as pd 
import numpy as np
import xlrd
import matplotlib.pyplot as plt

df_list = pd.DataFrame(
    {'X': X,
     'Y': Y,
    }) 
df = df_list.iloc[:,0:2].values  #transform to float type 

import sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree

#parameters
n_classes = 5
plot_colors = "ryb"
plot_step = 0.02

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],[1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = df #gives better result without [:, pair]
    y = C5

#train
clf = DecisionTreeClassifier().fit(X, y)

#plot the decision boundary
plt.subplot(2, 3, pairidx +1)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))
plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

# Plot the training points
for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, label=C5[i],
                cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")
plt.figure()
clf = DecisionTreeClassifier().fit(df_list, C5)
plot_tree(clf, filled=True)
plt.show()

There is the graph I have :

As you can see, the dots are not represented in the graphs. I give an example of the the awaited result :

Please remember that the best questions contain a reproducible example, but that the example in question is also minimal. — PiRocks
You have to narrow down your issue to one specific thing. Where is the earliest part things are wrong/incomplete? — smci
I have developed the question, I have also extended the dataset @PiRocks — C. Guffens
The problem is there are no dots in the output graphs (I have added a picture of my output) @NicolasGervais — C. Guffens

saravanan saminathan saravanan saminathan · Accepted Answer · 2019-12-12T02:41:56

idx = np.where(y == i)

In this line y is a list In order to return the index of of list which equals to i(variable) Just convert to the list to array everything will works fine

So just add this line before plotting scatter plot

**y = np.asarray(y)** # convert list to array since where will work for array only
# Plot the training points
for i, color in .....

Decision trees by labels

1 Answers