0
votes

I'm looking to create a line graph with the y axis having multiple lines for each unique entry found within my dataframe column.

My dataframe looks like this –

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'command': ['start', 'start', 'hold',
                               'release', 'hold', 'start',
                               'hold', 'hold', 'hold'],
                   'name': ['fred', 'wilma', 'barney',
                            'fred', 'barney', 'betty',
                            'pebbles', 'dino', 'wilma'],
                   'date': ['2020-05', '2020-05', '2020-05',
                            '2020-06', '2020-06', '2020-06',
                            '2020-07', '2020-07', '2020-07']})

I'm trying to create a line graph with the X axis as the date, and the y axis would have a separate line for each of the command entries(start, hold, & release in this example).

I tried using a groupby then executing this –

dfg = df.groupby(['command', 'date']).size()

for i in dfg.command.unique():
    x = dfg[dfg.command==i]['date']
    y = dfg[dfg.command==i]['size']
    plt.plot(x, y)
plt.show()

However I get this error - AttributeError: 'Series' object has no attribute 'command'

I've also tried creating a pivot table and building the graph from there as follows -

df_pv = pd.pivot_table(df, index=['command', 'date'],
                       values='name',
                       aggfunc='count')
df_pv.rename(columns={'name': 'count'}, inplace=True)

for i in df_pv.command.unique():
    x = df_pv[df_pv.command==i]['date']
    y = df_pv[df_pv.command==i]['count']
    plt.plot(x, y)
plt.show()

However it returns the error - AttributeError: 'DataFrame' object has no attribute 'command'

I'm not sure if I'm missing something in my approach?

Or if there is a better method of achieving this?

Thanks.

1

1 Answers

2
votes

You were very close. As the first error indicated df.groupby(['command', 'date']).size() returns a Series with a multiindex. If you want to work with that, you can turn it into a dataframe using .reset_index()

dfg = df.groupby(['command', 'date']).size().reset_index()

fig,ax = plt.subplots()
for com in dfg['command'].unique():
    ax.plot(dfg.loc[dfg['command']==com,'date'],dfg.loc[dfg['command']==com,0],'o-', label=com)
ax.legend()

Note that you could also directly work with the MultiIndex (although I generally find it more cumbersome). You can iterate over a specific level of the multiindex using groupby(level=) and access the content of a given level using MultiIndex.get_level_values():

dfg = df.groupby(['command', 'date']).size()

fig,ax = plt.subplots()
for com,subdf in dfg.groupby(level=0):
    ax.plot(subdf.index.get_level_values(level=1),subdf.values,'o-', label=com)
ax.legend()

Finally, if you want to save you the trouble of writing the loop yourself, you could use seaborn, which is pretty easy to use for this kind of plots (although you will need to transform your dataframe like in the first solution)

dfg = df.groupby(['command', 'date']).size().reset_index()
plt.figure()
sns.lineplot(data=dfg, x='date', y=0, hue='command', marker='o')

If you want to be really fancy, you can dispense of transforming your original dataframe yourself, and let seaborn.lineplot() do it, by instructing it how to aggreage the values for each date:

sns.lineplot(data=df, x='date', y=0, hue='command', estimator=pd.value_counts, marker='o')

all of these solutions yield the same output, with some minor esthetic differences.