0
votes

I am loading the iris dataset and I am plotting the features on xy plane. I want to plot sepal length and sepal width and then have an overlay plot on the categorical values (setosa, virginica, versicolor). I calling df_iris['species'] but it is only showing setosa in the legend. Any ideas on what I am doing wrong?

iris = datasets.load_iris()
print(type(iris))

#Convert sklearn.utils.Bunch datatype to dataframe
df_iris= pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
df_iris['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

df_iris['species']
## Since there are 4 feature that means there 6 possible combinations to plot on xy coordinate system
plt.scatter(df_iris['sepal length (cm)'], df_iris['sepal width (cm)'])
plt.grid(True)
plt.legend(df_iris['species'],loc ='lower right')

enter image description here

1
You are really complicating things by not using seaborn for this type of plots. iris = sns.load_dataset('iris'); sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species') `JohanC

1 Answers

2
votes

Try this code:

iris = datasets.load_iris()

#Convert sklearn.utils.Bunch datatype to dataframe
df_iris= pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
df_iris['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

df_iris['species']
## Since there are 4 feature that means there 6 possible combinations to plot on xy coordinate system
for species in set(df_iris['species']):
    df_species = df_iris[df_iris['species'] == species]
    plt.scatter(df_species['sepal length (cm)'], df_species['sepal width (cm)'], label=species)
plt.grid(True)
plt.legend(loc ='lower right')

You need to plot the different labeled points seperately.