Given the following dataFrame (subset of many areas and years, not the same years range per area) :
Area | Year | Food Aid (kg) | Population | Malnutrition | Malnutrition rate (%) |
---|---|---|---|---|---|
Afghanistan | 2013 | 128238 | 32269.589 | 8600.00000 | 26.65000 |
2014 | 57214 | 33370.79400 | 8800.00000 | 26.37000 | |
Algeria | 2013 | 35234 | 38140.13300 | 1300.00000 | 3.41000 |
2014 | 18980 | 38923.69200 | 1300.00000 | 3.34000 | |
2015 | 17424 | 39728.02500 | 1300.00000 | 3.27000 | |
2016 | 9476 | 40551.39200 | 1300.00000 | 3.21000 |
Created with a groupby from a dataset which contained data for each country and each available year ; then added the calculated row :
df.groupby(['Area', 'Year']).agg(
{
'Food Aid (kg)' : 'sum',
'Population' : lambda x: x.iloc[0],
'Malnutrition': lambda x: x.iloc[0]
})
df['Malnutrition rate (%)'] = round(((df.loc[:,'Malnutrition'] / df.loc[:,'Population']))*100,2)
I would like to add 2 aggregated columns as such. For a country, given a year N, the percentage of evolution since N-1 of :
- Food Aid
- Malnutrition
Something like :
Area | Year | Food Aid (kg) | Population | Malnutrition | Malnutrition rate (%) | Food Aid evolution (%) | Malnutrition rate evolution (%) |
---|---|---|---|---|---|---|---|
Afghanistan | 2013 | 128238 | 32269.589 | 8600.00000 | 26.65000 | --- | --- |
2014 | 57214 | 33370.79400 | 8800.00000 | 26.37000 | -55.38 | -1.051 | |
Algeria | 2013 | 35234 | 38140.13300 | 1300.00000 | 3.41000 | --- | --- |
2014 | 18980 | 38923.69200 | 1300.00000 | 3.34000 | -46.132 | -2.053 | |
2015 | 17424 | 39728.02500 | 1300.00000 | 3.27000 | -8.198 | -2.096 | |
2016 | 9476 | 40551.39200 | 1300.00000 | 3.21000 | -45.615 | -1.835 |
I'm quite a beginner in Pandas. I've tried several things without much success. For example, this loop I'm not sure how to use to make things work:
for key, value in df:
# key is an array of grouped keys. Examples: (Afghanistan, 2013), (Afghanistan, 2014)
# value is a Series of the various cols for the current row
Question: in a dataframe grouped by 2 fields: how to add a new column which contains a calculation between the rows n and n-1 of the 2nd grouping field, inside the 1st grouping field ?