2
votes

I have a multiindex DataFrame that looks like the data below. When I plot the data, the graph looks like below.

How can I plot a bar graph, where the color of the bars is determined by my desired category (ex: 'City'). Thus, all bars belonging to the same city have the same color, regardless of the year. For example: In the graph below, all ATL bars should be red, while all MIA bars should be blue.

enter image description here

City            ATL                                    MIA               \
Year           2010         2011         2012         2010         2011   
Taste                                                                     
Bitter  3159.861983  3149.806667  2042.348937  3124.586470  3119.541240   
Sour    1078.897032  3204.689424  3065.818991  2084.322056  2108.568495   
Spicy   5280.847114  3134.597728  1015.311288  2036.494136  1001.532560   
Sweet   1056.169267  1015.368646  4217.145165  3134.734027  4144.826118   

City                 
Year           2012  
Taste                
Bitter  1070.925695  
Sour    3178.131540  
Spicy   3164.382635  
Sweet   3173.919338 

Below is my code:

import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

matplotlib.style.use('ggplot')

def main():

    taste = ['Sweet','Spicy','Sour','Bitter']
    store = ['Asian','Italian','American','Greek','Mexican']

    df1 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df2 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df3 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df4 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df5 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})


    df6 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})



    df1['Year'] = '2010'
    df1['City'] = 'MIA'

    df2['Year'] = '2011'
    df2['City'] = 'MIA'

    df3['Year'] = '2012'
    df3['City'] = 'MIA'

    df4['Year'] = '2010'
    df4['City'] = 'ATL'

    df5['Year'] = '2011'
    df5['City'] = 'ATL'

    df6['Year'] = '2012'
    df6['City'] = 'ATL'


    DF = pd.concat([df1,df2,df3,df4,df5,df6])
    DFG = DF.groupby(['Taste', 'Year', 'City'])
    DFGSum = DFG.sum().unstack(['Year','City']).sum(axis=1,level=['City','Year'])
    print DFGSum

    '''
    In my plot, I want the color of the bars to be determined by the "City".
    For example: All "ATL" bar colors will be the same regardless of the year.
    '''
    DFGSum.plot(kind='bar')


    plt.show()

if __name__ == '__main__':
    main()
2

2 Answers

3
votes

Edited to include color cycling and arbitrary number of cities

You will need to specify a few extra args to get it to look nice, but something like this might work

import itertools # for color cycling

# specify the colors you want for each city
color_cycle = itertools.cycle( plt.rcParams['axes.color_cycle']  )
colors = { cty:color_cycle.next() for cty in DF.City.unique() }

#spcify the relative position of each bar
n = len(list(DFGSum))
positions = linspace(-n/2., n/2., n)

# plot each column individually
for i,col in enumerate(list(DFGSum)):
    c = colors[col[0]]
    pos = positions[i]
    DFGSum[col].plot(kind='bar', color=c, 
                     position=pos, width=0.05)

plt.legend()
plt.show()

enter image description here

Though here you cannot tell which bar corresponds to which year...

Alternate solution

You can also make a slightly different kind of plot which preserves the year info in the tick labels. This is generalizable to any number of cities and will keep the default color style

df = DFG.sum().reset_index().set_index(['Taste','Year'])
u_cty = df.City.unique() #array(['ATL', 'MIA'], dtype=object)
df_list = []
for cty in u_cty:
    d = df.loc[ df.City==cty ]
    d = d[['Sold']].rename(columns={'Sold':cty}).reset_index()
    df_list.append(d)

df_merged = reduce(lambda left, right: pandas.merge(left, right, on=['Taste','Year'], how='outer'), df_list ) # merge the dataframes
df_merged.set_index(['Taste','Year'], inplace=True)
                     ATL          MIA
Taste  Year                          
Bitter 2010  3211.239754  2070.907629
       2011  2158.068222  2145.373251
       2012  2138.624730  1062.306874
Sour   2010  4188.024600          NaN
       2011  4323.003409          NaN
       2012  1042.772615  2136.742869
Spicy  2010  1018.737977  3155.450265
       2012  4171.954201  2096.569762
Sweet  2010  2098.679545  5324.078957
       2011  4215.376670  2115.964824
       2012  3152.998667  5277.410536
Spicy  2011          NaN  6295.032147

df_merged.plot(kind='bar')

enter image description here

3
votes

I have found a solution to my own question. I give partial credit to @dermen who originally answered my question. My answer was inspired by his approach.

Although @dermen's solution is correct, I felt I needed a method where I don't have to manually adjust the width of the bars or worry about positions.

The solution below can be adapted to arbitrary amount of cities, and the yearly data belonging to that city. It is important to know that in the solution below, the DataFrame being plotted is a multilevel DataFrame. The solution may break in situations where the DataFrame is sorted, because plotting occurs in a specific order.

enter image description here

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

matplotlib.style.use('ggplot')


taste = ['Sweet','Spicy','Sour','Bitter']
store = ['Asian','Italian','American','Greek','Mexican']

df1 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})

df2 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})

df3 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})

df4 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})

df5 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})


df6 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})


df7 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})


df8 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})

df9 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})


df10 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                   'Store':[random.choice(store) for x in range(10)],
                   'Sold':1000+100*np.random.rand(10)})



df1['Year'] = '2010'
df1['City'] = 'MIA'

df2['Year'] = '2011'
df2['City'] = 'MIA'

df3['Year'] = '2012'
df3['City'] = 'MIA'

df4['Year'] = '2010'
df4['City'] = 'ATL'

df5['Year'] = '2011'
df5['City'] = 'ATL'

df6['Year'] = '2012'
df6['City'] = 'ATL'


df7['Year'] = '2013'
df7['City'] = 'ATL'

df8['Year'] = '2014'
df8['City'] = 'ATL'

df9['Year'] = '2013'
df9['City'] = 'CHI'

df10['Year'] = '2014'
df10['City'] = 'CHI'

DF = pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10])

DFG = DF.groupby(['Taste', 'Year', 'City'])
DFGSum = DFG.sum().unstack(['Year','City']).sum(axis=1,level=['City','Year'])
#DFGSum is a multilevel DataFrame 

import itertools 
color_cycle = itertools.cycle( plt.rcParams['axes.color_cycle']  )

plot_colors = [] #Array for a squenece of colors to be plotted 

for city in DFGSum.columns.get_level_values('City').unique(): 
  set_color = color_cycle.next() #Set the color for the city 
  for year in DFGSum[city].columns.get_level_values('Year').unique():
    plot_colors.append(set_color)
    #For each unqiue city, all the yearly data belonging to that city will have the same color 

DFGSum.plot(kind='bar',color=plot_colors)
# The color pramater of the plot function allows a list of colors sequences to be specified