In a DataFrame grouped by 2 fields, calculate difference between rows n and n-1 from 2nd grouping field inside the 1st grouping field

Question

Given the following dataFrame (subset of many areas and years, not the same years range per area) :

Area	Year	Food Aid (kg)	Population	Malnutrition	Malnutrition rate (%)
Afghanistan	2013	128238	32269.589	8600.00000	26.65000
	2014	57214	33370.79400	8800.00000	26.37000
Algeria	2013	35234	38140.13300	1300.00000	3.41000
	2014	18980	38923.69200	1300.00000	3.34000
	2015	17424	39728.02500	1300.00000	3.27000
	2016	9476	40551.39200	1300.00000	3.21000

Created with a groupby from a dataset which contained data for each country and each available year ; then added the calculated row :

df.groupby(['Area', 'Year']).agg(
    {
        'Food Aid (kg)' : 'sum',
        'Population' : lambda x: x.iloc[0],
        'Malnutrition': lambda x: x.iloc[0]
    })

df['Malnutrition rate (%)'] = round(((df.loc[:,'Malnutrition'] / df.loc[:,'Population']))*100,2)

I would like to add 2 aggregated columns as such. For a country, given a year N, the percentage of evolution since N-1 of :

Food Aid
Malnutrition

Something like :

Area	Year	Food Aid (kg)	Population	Malnutrition	Malnutrition rate (%)	Food Aid evolution (%)	Malnutrition rate evolution (%)
Afghanistan	2013	128238	32269.589	8600.00000	26.65000	---	---
	2014	57214	33370.79400	8800.00000	26.37000	-55.38	-1.051
Algeria	2013	35234	38140.13300	1300.00000	3.41000	---	---
	2014	18980	38923.69200	1300.00000	3.34000	-46.132	-2.053
	2015	17424	39728.02500	1300.00000	3.27000	-8.198	-2.096
	2016	9476	40551.39200	1300.00000	3.21000	-45.615	-1.835

I'm quite a beginner in Pandas. I've tried several things without much success. For example, this loop I'm not sure how to use to make things work:

for key, value in df:
    # key is an array of grouped keys. Examples: (Afghanistan, 2013), (Afghanistan, 2014)
    # value is a Series of the various cols for the current row

Question: in a dataframe grouped by 2 fields: how to add a new column which contains a calculation between the rows n and n-1 of the 2nd grouping field, inside the 1st grouping field ?

Corralien Corralien · Accepted Answer · 2021-08-12T20:11:14

Use pct_change:

new_cols = {'Food Aid (kg)': 'Food Aid evolution (%)',
            'Malnutrition rate (%)': 'Malnutrition rate evolution (%)'}

out = df.join(df.groupby('Area')[['Food Aid (kg)', 'Malnutrition rate (%)']]
                .pct_change().mul(100).rename(columns=new_cols))

>>> out[new_cols.values()]

                  Food Aid evolution (%)  Malnutrition rate evolution (%)
Area        Year
Afghanistan 2013                     NaN                              NaN
            2014              -55.384519                        -1.050657
Algeria     2013                     NaN                              NaN
            2014              -46.131577                        -2.052786
            2015               -8.198103                        -2.095808
            2016              -45.615243                        -1.834862

In a DataFrame grouped by 2 fields, calculate difference between rows n and n-1 from 2nd grouping field inside the 1st grouping field

2 Answers