Python - Pandas - value_counts of all columns in a grouped dataframe

Question

I have a survey dataset with a 7 point scale for all questions and I want to get the value_counts of common values in all columns (and also group the dataframe by two columns). Let me show you a sample dataset and where I've reached so far.

| col1          | col2          | col3          | Building      | Levels_Name            |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied     | Satisfied     | NA            | Basingstoke   | Individual Contributor |
| Not Satisfied | Satisfied     | Not Satisfied | San Francisco | Middle Management      |
| Not Satisfied | Satisfied     | Not Satisfied | Miami         | Senior Leadership      |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City   | Senior Leadership      |
| NA            | NA            | NA            | Foster City   | Other                  |
| Not Satisfied | Not Satisfied | NA            | Foster City   | Senior Leadership      |
| Not Satisfied | Satisfied     | Not Satisfied | Austin        | Middle Management      |
| Satisfied     | Satisfied     | Satisfied     | San Francisco | Senior Leadership      |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City   | Individual Contributor |
| Satisfied     | Satisfied     | NA            | Miami         | Middle Management      |

Now I want to group this data set by 'Building' and 'Levels_Name' and add a new grouping for 'Satisfied','Not Satisfied','NA' and get the value counts for each columns.

So the result should look like the following:

| Building      | Levels_Name            | Sentiment     | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City   | Individual Contributor | Not Satisfied | 1    | 1    | 1    |
| Foster City   | Individual Contributor | NA            | 0    | 0    | 0    |
| Foster City   | Individual Contributor | Satisfied     | 0    | 0    | 0    |
| Foster City   | Senior Leadership      | Not Satisfied | 2    | 2    | 0    |
| Foster City   | Senior Leadership      | NA            | 0    | 0    | 1    |
| Foster City   | Senior Leadership      | Satisfied     | 0    | 0    | 0    |
| San Francisco | Individual Contributor | Not Satisfied | 1    | 1    | 1    |
| San Francisco | Individual Contributor | NA            | 0    | 0    | 0    |
| San Francisco | Individual Contributor | Satisfied     | 0    | 0    | 0    |

Thank you!

piRSquared piRSquared · Accepted Answer · 2017-05-07T08:05:18

First, you want to melt the dataframe, then do a group by

d1 = pd.melt(
    df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')

d1.groupby(
    d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()

variable       Building             Levels_Name      Sentiment  col1  col2  col3
0                Austin       Middle Management  Not Satisfied     1     0     1
1                Austin       Middle Management      Satisfied     0     1     0
2           Basingstoke  Individual Contributor            NaN     0     0     1
3           Basingstoke  Individual Contributor      Satisfied     1     1     0
4           Foster City  Individual Contributor  Not Satisfied     1     1     1
5           Foster City                   Other            NaN     1     1     1
6           Foster City       Senior Leadership            NaN     0     0     1
7           Foster City       Senior Leadership  Not Satisfied     2     2     1
8                 Miami       Middle Management            NaN     0     0     1
9                 Miami       Middle Management      Satisfied     1     1     0
10                Miami       Senior Leadership  Not Satisfied     1     0     1
11                Miami       Senior Leadership      Satisfied     0     1     0
12        San Francisco  Individual Contributor  Not Satisfied     1     1     1
13        San Francisco       Middle Management  Not Satisfied     1     0     1
14        San Francisco       Middle Management      Satisfied     0     1     0
15        San Francisco       Senior Leadership      Satisfied     1     1     1

Python - Pandas - value_counts of all columns in a grouped dataframe

1 Answers