I have a dataframe which looks like this
pd.DataFrame({'A': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C3', 'C3'],
...: 'date': [date(2019, 12, 31), date(2018, 12, 31), date(2017, 12, 31), date(2016, 12, 31), date(2017, 12, 31), date(2016, 12, 31), date(2018, 12, 31), date(2016, 12, 31)],
...: 'value': [9, 9, 8, 4, 8, 3, 6, 4]})
Out[13]:
A date value
0 C1 2019-12-31 9
1 C1 2018-12-31 9
2 C1 2017-12-31 8
3 C1 2016-12-31 4
4 C2 2017-12-31 8
5 C2 2016-12-31 3
6 C3 2018-12-31 6
7 C3 2016-12-31 4
first_year = date(2016, 12, 31)
last_year = date(2019, 12, 31)
For each group I need to add the missing years for each group in column 'A' and take the 'value' of the previous year. I would like to say via input variable what my first and last year should be. my resulting dataframe should look like this
A date value
0 C1 2019-12-31 9
1 C1 2018-12-31 9
2 C1 2017-12-31 8
3 C1 2016-12-31 4
4 C2 2019-12-31 8
5 C2 2018-12-31 8
6 C2 2017-12-31 8
7 C2 2016-12-31 3
8 C3 2019-12-31 6
9 C3 2018-12-31 6
10 C3 2017-12-31 4
11 C3 2016-12-31 4
following logic applies (by group in column A)
C1 = all years between 2016 and 2019 available already
C2 = years 2018 and 2019 missing, need to be added and get value from last available year in 2017 value = 8
C3 = year 2017 missing, gets value from year 2016. and year 2019 missing, gets value from 2018