backfill values for only selected members of group - pandas

Question

This should be simple, but took me too many rows to complete:

I have a pandas data frame indexed by id and year.

For the groups A and B, I want to backfill only the values of 2022 to 2021 (2019 values should be left untouched). See table below.

This worked in the end, but cumbersome:

df = df.swaplevel()
# create a frame with backfilled values
dfbf = df.loc[[2021,2022]].groupby('id')['value'].bfill(limit=1).to_frame()
df = df.join(dfbf, rsuffix = '_fill')
df = df.reset_index(level = "id")
df['value'].loc[2021] = df.loc[2021][['value', 'value_fill']].sum(axis = 1)
df = df.set_index(['id'], append=True).swaplevel()

Example:

		value			value
id	year		id	year
A	2019		A	2019
A	2020	1	A	2020	1
A	2021		A	2021	3
A	2022	3	A	2022	3
B	2019		B	2019
B	2020	12	B	2020	12
B	2021		B	2021	11
B	2022	11	B	2022	11

jezrael jezrael · Accepted Answer · 2021-09-08T07:37:43

Select rows by first by DataFrame.loc with GroupBy.bfill with DataFrame.update:

df.update(df.loc[:, [2021,2022], :].groupby('id')['value'].bfill(limit=1))
print (df)
         value
id year       
A  2019    NaN
   2020    1.0
   2021    3.0
   2022    3.0
B  2019    NaN
   2020   12.0
   2021   11.0
   2022   11.0

Or use mask for expected rows (filtered in both sides for improve performance - processing only selected rows, not all rows):

#include
m = df.index.get_level_values('year').isin([2021,2022])
#exclude
#m = df.index.get_level_values('year') != 2019
df.loc[m, 'value'] = df[m].groupby('id')['value'].bfill(limit=1)
print (df)
         value
id year       
A  2019    NaN
   2020    1.0
   2021    3.0
   2022    3.0
B  2019    NaN
   2020   12.0
   2021   11.0
   2022   11.0

backfill values for only selected members of group - pandas

2 Answers