This question is slightly different from usual filling of NaN values.
Suppose I have a dataframe, where in I group by some category. Now I want to fill the NaN values of a column by using the mean value of that group but from different column. Let me take an example:
a = pd.DataFrame({
'Occupation': ['driver', 'driver', 'mechanic', 'teacher', 'mechanic', 'teacher',
'unemployed', 'driver', 'mechanic', 'teacher'],
'salary': [100, 150, 70, 300, 90, 250, 10, 90, 110, 350],
'expenditure': [20, 40, 10, 100, np.nan, 80, 0, np.nan, 40, 120]})
a['diff'] = a.salary - a.expenditure
Occupation salary expenditure diff
0 driver 100 20.0 80.0
1 driver 150 40.0 110.0
2 mechanic 70 10.0 60.0
3 teacher 300 100.0 200.0
4 mechanic 90 NaN NaN
5 teacher 250 80.0 170.0
6 unemployed 10 0.0 10.0
7 driver 90 NaN NaN
8 mechanic 110 40.0 70.0
9 teacher 350 120.0 230.0
So, in the above case, I would like to fill the NaN values in expenditure as: salary - mean(difference) for each group.
How do I do that using pandas?