Matplotlib scatter legend with colors using categorical variable

Question

I have made a simple scatterplot using matplotlib showing data from 2 numerical variables (varA and varB) with colors that I defined with a 3rd categorical string variable (col) containing 10 unique colors (corresponding to another string variable with 10 unique names), all in the same Pandas DataFrame with 100+ rows. Is there an easy way to create a legend for this scatterplot that shows the unique colored dots and their corresponding category names? Or should I somehow group the data and plot each category in a subplot to do this? This is what I have so far:

import matplotlib.pyplot as plt
from matplotlib import colors as mcolors

varA = df['A']
varB = df['B'] 
col = df['Color']

plt.scatter(varA,varB, c=col, alpha=0.8)
plt.legend()

plt.show()

See matplotlib-scatterplot-with-legend, setting-a-legend-matching-the-colours-in-pyplot-scatter — ImportanceOfBeingErnest

harvpan harvpan · Accepted Answer · 2018-05-08T17:58:29

Considering, Color is the column that has all the colors and labels, you can simply do following.

colors = list(df['Color'].unique())
for i in range(0 , len(colors)):
    data = df.loc[df['Color'] == colors[i]]
    plt.scatter('A', 'B', data=data, color='Color', label=colors[i])
plt.legend()
plt.show()

Matplotlib scatter legend with colors using categorical variable

2 Answers