Filling NaN values in pandas after grouping

Question

This question is slightly different from usual filling of NaN values.

Suppose I have a dataframe, where in I group by some category. Now I want to fill the NaN values of a column by using the mean value of that group but from different column. Let me take an example:

a = pd.DataFrame({
'Occupation': ['driver', 'driver', 'mechanic', 'teacher', 'mechanic', 'teacher',
    'unemployed', 'driver', 'mechanic', 'teacher'],
'salary': [100, 150, 70, 300, 90, 250, 10, 90, 110, 350],
'expenditure': [20, 40, 10, 100, np.nan, 80, 0, np.nan, 40, 120]})
a['diff'] = a.salary - a.expenditure

    Occupation  salary  expenditure diff
0   driver      100     20.0        80.0
1   driver      150     40.0        110.0
2   mechanic    70      10.0        60.0
3   teacher     300     100.0       200.0
4   mechanic    90      NaN         NaN
5   teacher     250     80.0        170.0
6   unemployed  10      0.0         10.0
7   driver      90      NaN         NaN
8   mechanic    110     40.0        70.0
9   teacher     350     120.0       230.0

So, in the above case, I would like to fill the NaN values in expenditure as: salary - mean(difference) for each group.

How do I do that using pandas?

See I am trying to fill it with the salary at that particular point - mean of difference for that group. I might have phrased that incorrectly in post, let me edit it. — Rishabh Rao

RichieV RichieV · Accepted Answer · 2020-09-06T04:16:05

You can create that new series with the desired values, groupby.transform and use to update the target column.

Assuming you want to group by Occupation

a['mean_diff'] = a.groupby('Occupation')['diff'].transform('mean')
a.expenditure.mask(
    a.expenditure.isna(),
    a.salary - a.mean_diff,
    inplace=True
)

Output

   Occupation  salary  expenditure   diff  mean_diff
0      driver     100         20.0   80.0       95.0
1      driver     150         40.0  110.0       95.0
2    mechanic      70         10.0   60.0       65.0
3     teacher     300        100.0  200.0      200.0
4    mechanic      90         25.0    NaN       65.0
5     teacher     250         80.0  170.0      200.0
6  unemployed      10          0.0   10.0       10.0
7      driver      90         -5.0    NaN       95.0
8    mechanic     110         40.0   70.0       65.0
9     teacher     350        120.0  230.0      200.0

Filling NaN values in pandas after grouping

1 Answers