1
votes

I have the following dataset:

SessionId    Query
   1           a   
   1           b
   2           a
   3           b
   3           b
   3           c
   3           a

I want to display a stacked bar chart where there will be a bar for each Session and the bar will consist of different colors for each Query that it has, the stacked size will be in the size of the number of queries in each session.

I tried something like this:

result = data.groupby('SessionId').apply(
   lambda group: (
      group.groupby('Query').apply(
         lambda queryGroup: (
            queryGroup.count()
         )                
      )
   )
 ) 

But it gives a weird table inside a table

2
You should group by both columns, and use the .size() aggregation (if I recall correctly). Then you'll have the data you need for the bar plot.Shovalt
@Shovalt: this would look somethink like data.groupby(['SessionId', 'Query']).size().unstack().plot.barh(stacked=True), I guess.stephan
@stephan: exactly what I meant, I was on mobile and couldn't test :)Shovalt

2 Answers

7
votes

crosstab should do the job if I understand your question correctly:

import pandas as pd

data = pd.DataFrame({'SessionId': [1, 1, 2, 3, 3, 3, 3], 
                     'Query': ['a', 'b', 'a', 'b', 'b', 'c', 'a']})

pd.crosstab(data.SessionId, data.Query).plot.barh(stacked=True)

enter image description here

0
votes

Pandas stacked bar chart relies on plotting different columns, so you'd need to pivot your data table to get the queries as columns holding the number of queries in rows.

Try this:

df = pd.DataFrame({"session":[1,1,2,2,3,3],
              "query":list("ababab"), "count":[5,7,32,5,8,1]})
df.pivot("session","query").plot(kind="bar", stacked=True)

Output:

enter image description here