0
votes

This should be simple, but took me too many rows to complete:

I have a pandas data frame indexed by id and year.

For the groups A and B, I want to backfill only the values of 2022 to 2021 (2019 values should be left untouched). See table below.

This worked in the end, but cumbersome:

df = df.swaplevel()
# create a frame with backfilled values
dfbf = df.loc[[2021,2022]].groupby('id')['value'].bfill(limit=1).to_frame()
df = df.join(dfbf, rsuffix = '_fill')
df = df.reset_index(level = "id")
df['value'].loc[2021] = df.loc[2021][['value', 'value_fill']].sum(axis = 1)
df = df.set_index(['id'], append=True).swaplevel()

Example:

value value
id year id year
A 2019 A 2019
A 2020 1 A 2020 1
A 2021 A 2021 3
A 2022 3 A 2022 3
B 2019 B 2019
B 2020 12 B 2020 12
B 2021 B 2021 11
B 2022 11 B 2022 11
2

2 Answers

1
votes

Select rows by first by DataFrame.loc with GroupBy.bfill with DataFrame.update:

df.update(df.loc[:, [2021,2022], :].groupby('id')['value'].bfill(limit=1))
print (df)
         value
id year       
A  2019    NaN
   2020    1.0
   2021    3.0
   2022    3.0
B  2019    NaN
   2020   12.0
   2021   11.0
   2022   11.0

Or use mask for expected rows (filtered in both sides for improve performance - processing only selected rows, not all rows):

#include
m = df.index.get_level_values('year').isin([2021,2022])
#exclude
#m = df.index.get_level_values('year') != 2019
df.loc[m, 'value'] = df[m].groupby('id')['value'].bfill(limit=1)
print (df)
         value
id year       
A  2019    NaN
   2020    1.0
   2021    3.0
   2022    3.0
B  2019    NaN
   2020   12.0
   2021   11.0
   2022   11.0
0
votes

You can simply subset the values to replace using loc and a condition on the 'year' level to avoid 2019:

df.loc[df.index.get_level_values('year')!=2019] = df.groupby(level=0).bfill()

output:

         value
id year       
A  2019    NaN
   2020    1.0
   2021    3.0
   2022    3.0
B  2019    NaN
   2020   12.0
   2021   11.0
   2022   11.0