3
votes

When using seaborn, is there a way I can include multiple variables (columns) for the hue parameter? Another way to ask this question would be how can I group my data by multiple variables before plotting them on a single x,y axis plot?

I want to do something like below. However currently I am not able to specify two variables for the hue parameter.:

sns.relplot(x='#', y='Attack', hue=['Legendary', 'Stage'], data=df)

For example, assume I have a pandas DataFrame like below containing an a Pokemon database obtained via this tutorial.

enter image description here

I want to plot on the x-axis the pokedex #, and the y-axis the Attack. However, I want to data to be grouped by both Stage and Legendary. Using matplotlib, I wrote a custom function that groups the dataframe by ['Legendary','Stage'], and then iterates through each group for the plotting (see results below). Although my custom function works as intended, I was hoping this can be achieved simply by seaborn. I am guessing there must be other people what have attempted to visualize more than 3 variables in a single plot using seaborn?

fig, ax = plt.subplots()
grouping_variables = ['Stage','Legendary']
group_1 = df.groupby(grouping_variables)
for group_1_label, group_1_df in group_1:
    ax.scatter(group_1_df['#'], group_1_df['Attack'], label=group_1_label)
ax_legend = ax.legend(title=grouping_variables)    

enter image description here

Edit 1:

Note: In the example I provided, I grouped the data by obly two variables (ex: Legendary and Stage). However, other situations may require arbitrary number of variables (ex: 5 variables).

2

2 Answers

4
votes

To use hue of seaborn.relplot, consider concatenating the needed groups into a single column and then run the plot on new variable:

def run_plot(df, flds):
   # CREATE NEW COLUMN OF CONCATENATED VALUES
   df['_'.join(flds)] =  pd.Series(df.reindex(flds, axis='columns')
                                     .astype('str')
                                     .values.tolist()
                                  ).str.join('_')

   # PLOT WITH hue
   sns.relplot(x='#', y='Attack', hue='_'.join(flds), data=random_df, aspect=1.5)
   plt.show()

   plt.clf()
   plt.close()

To demonstrate with random data

Data

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

### DATA
np.random.seed(22320)
random_df = pd.DataFrame({'#': np.arange(1,501),
                          'Name': np.random.choice(['Bulbasaur', 'Ivysaur', 'Venusaur', 
                                                    'Charmander', 'Charmeleon'], 500),
                          'HP': np.random.randint(1, 100, 500),
                          'Attack': np.random.randint(1, 100, 500),
                          'Defense': np.random.randint(1, 100, 500),
                          'Sp. Atk': np.random.randint(1, 100, 500),
                          'Sp. Def': np.random.randint(1, 100, 500),
                          'Speed': np.random.randint(1, 100, 500),
                          'Stage': np.random.randint(1, 3, 500),
                          'Legend': np.random.choice([True, False], 500)
                          })

Plots

run_plot(random_df, ['Legend', 'Stage'])

Two Group Plot Output

run_plot(random_df, ['Legend', 'Stage', 'Name'])

Three Group Plot

0
votes

In seaborn's scatterplot(), you can combine both a hue= and a style= parameter to produce different markers and different colors for each combinations

example (taken verbatim from the documentation):

tips = sns.load_dataset("tips")
ax = sns.scatterplot(x="total_bill", y="tip", data=tips)
ax = sns.scatterplot(x="total_bill", y="tip",
                     hue="day", style="time", data=tips)

enter image description here