1
votes

I need to do a decision tree and represent the data by labels on a graph (as showed in the two illustrations). I have no problem for the decision tree, unfortunately, the dots are not imput in the graph. I have tried several changes in the code. The code come from the scikit learn website Plot the decision surface of a decision tree on the iris dataset


There is an example of the data used (X,Y, C5) on below (from an excel file) :

Path = "Documents/Apprentissage/Python/Script/ClustAllRepres12.xlsx"
Wordbook = xlrd.open_workbook(Path)
Sheet = Wordbook.sheet_by_index(0)

X=[]
Y=[]
C5=[]

for i in range(1, Sheet.nrows):
    X.append(Sheet.cell_value(i, 0))
    Y.append(Sheet.cell_value(i, 1))
    C5.append(Sheet.cell_value(i, 8))    

X

Out[]: [8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 11.0, 12.0, 12.0, 12.0, 12.0, 12.0, 13.0, 13.0, 13.0, 14.0, 14.0, 14.0, 15.0, 15.0, 15.0, 15.0, 16.0, 16.0, 16.0, 16.0, 17.0, 17.0, 17.0, 17.0, 18.0, 18.0, 18.0, 18.0, 18.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 20.0, 21.0, 21.0, 22.0]

type(X)

Out[]: list

Y

Out[]: [45.0, 17.0, 18.0, 24.0, 25.0, 27.0, 36.0, 38.0, 39.0, 24.0, 37.0, 40.0, 24.0, 31.0, 35.0, 36.0, 37.0, 39.0, 32.0, 33.0, 35.0, 43.0, 27.0, 31.0, 35.0, 42.0, 18.0, 39.0, 43.0, 31.0, 42.0, 28.0, 32.0, 35.0, 43.0, 51.0, 52.0, 17.0, 19.0, 53.0, 49.0, 51.0, 53.0, 58.0, 16.0, 58.0, 59.0, 50.0, 52.0, 54.0]

type(Y)

Out[]: list

C5

Out[]: [2.0, 4.0, 3.0, 3.0, 3.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 4.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 5.0, 1.0, 3.0, 1.0, 1.0, 4.0, 4.0, 4.0, 1.0, 5.0, 1.0, 5.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0]

type(C5)

Out[]: list


There is the code from the scikit learn website:

import pandas as pd 
import numpy as np
import xlrd
import matplotlib.pyplot as plt

df_list = pd.DataFrame(
    {'X': X,
     'Y': Y,
    }) 
df = df_list.iloc[:,0:2].values  #transform to float type 

import sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree

#parameters
n_classes = 5
plot_colors = "ryb"
plot_step = 0.02

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],[1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = df #gives better result without [:, pair]
    y = C5

#train
clf = DecisionTreeClassifier().fit(X, y)

#plot the decision boundary
plt.subplot(2, 3, pairidx +1)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))
plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

# Plot the training points
for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, label=C5[i],
                cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")
plt.figure()
clf = DecisionTreeClassifier().fit(df_list, C5)
plot_tree(clf, filled=True)
plt.show()

There is the graph I have :

enter image description here

As you can see, the dots are not represented in the graphs. I give an example of the the awaited result :

enter image description here

1
Please remember that the best questions contain a reproducible example, but that the example in question is also minimal.PiRocks
so what is your problem exactly?Nicolas Gervais
You have to narrow down your issue to one specific thing. Where is the earliest part things are wrong/incomplete?smci
I have developed the question, I have also extended the dataset @PiRocksC. Guffens
The problem is there are no dots in the output graphs (I have added a picture of my output) @NicolasGervaisC. Guffens

1 Answers

0
votes

idx = np.where(y == i)

In this line y is a list In order to return the index of of list which equals to i(variable) Just convert to the list to array everything will works fine

So just add this line before plotting scatter plot

**y = np.asarray(y)** # convert list to array since where will work for array only
# Plot the training points
for i, color in .....