2
votes

Given the following dataFrame (subset of many areas and years, not the same years range per area) :

Area Year Food Aid (kg) Population Malnutrition Malnutrition rate (%)
Afghanistan 2013 128238 32269.589 8600.00000 26.65000
2014 57214 33370.79400 8800.00000 26.37000
Algeria 2013 35234 38140.13300 1300.00000 3.41000
2014 18980 38923.69200 1300.00000 3.34000
2015 17424 39728.02500 1300.00000 3.27000
2016 9476 40551.39200 1300.00000 ​ 3.21000

Created with a groupby from a dataset which contained data for each country and each available year ; then added the calculated row :

df.groupby(['Area', 'Year']).agg(
    {
        'Food Aid (kg)' : 'sum',
        'Population' : lambda x: x.iloc[0],
        'Malnutrition': lambda x: x.iloc[0]
    })

df['Malnutrition rate (%)'] = round(((df.loc[:,'Malnutrition'] / df.loc[:,'Population']))*100,2)

I would like to add 2 aggregated columns as such. For a country, given a year N, the percentage of evolution since N-1 of :

  • Food Aid
  • Malnutrition

Something like :

Area Year Food Aid (kg) Population Malnutrition Malnutrition rate (%) Food Aid evolution (%) Malnutrition rate evolution (%)
Afghanistan 2013 128238 32269.589 8600.00000 26.65000 --- ---
2014 57214 33370.79400 8800.00000 26.37000 -55.38 -1.051
Algeria 2013 35234 38140.13300 1300.00000 3.41000 --- ---
2014 18980 38923.69200 1300.00000 3.34000 -46.132 -2.053
2015 17424 39728.02500 1300.00000 3.27000 -8.198 -2.096
2016 9476 40551.39200 1300.00000 ​ 3.21000 -45.615 -1.835

I'm quite a beginner in Pandas. I've tried several things without much success. For example, this loop I'm not sure how to use to make things work:

for key, value in df:
    # key is an array of grouped keys. Examples: (Afghanistan, 2013), (Afghanistan, 2014)
    # value is a Series of the various cols for the current row

Question: in a dataframe grouped by 2 fields: how to add a new column which contains a calculation between the rows n and n-1 of the 2nd grouping field, inside the 1st grouping field ?

2

2 Answers

3
votes

Use pct_change:

new_cols = {'Food Aid (kg)': 'Food Aid evolution (%)',
            'Malnutrition rate (%)': 'Malnutrition rate evolution (%)'}

out = df.join(df.groupby('Area')[['Food Aid (kg)', 'Malnutrition rate (%)']]
                .pct_change().mul(100).rename(columns=new_cols))
>>> out[new_cols.values()]

                  Food Aid evolution (%)  Malnutrition rate evolution (%)
Area        Year
Afghanistan 2013                     NaN                              NaN
            2014              -55.384519                        -1.050657
Algeria     2013                     NaN                              NaN
            2014              -46.131577                        -2.052786
            2015               -8.198103                        -2.095808
            2016              -45.615243                        -1.834862
3
votes

Look at pct_change

df.groupby(level=0).agg({'Food Aid (kg)': 'pct_change',
                         'Malnutrition rate (%)': 'pct_change'})*100

             Food Aid (kg)  Malnutrition rate (%)
Area                                             
Afghanistan            NaN                    NaN
Afghanistan     -55.384519              -1.050657
Algeria                NaN                    NaN
Algeria         -46.131577              -2.052786
Algeria          -8.198103              -2.095808
Algeria         -45.615243              -1.834862

This assumes that the years you want to compare are ordered and sequential