I have dataframe like this:
STOREID VARIANT_ARTICLE PO_DATE UNITSUM
0 st123 12345 20200427 9.0
1 st123 12345 20200428 3.0
2 st123 12345 20200429 13.0
3 st123 12345 20200430 7.0
4 st123 12345 20200501 16.0
5 st123 12345 20200502 3.0
6 st123 12345 20200503 5.0
7 st123 12345 20200504 10.0
8 st123 12345 20200505 3.0
9 st123 12345 20200506 7.0
10 st123 12345 20200507 29.0
11 st123 12345 20200508 4.0
12 st123 12345 20200509 9.0
13 st123 12345 20200510 8.0
14 st123 12345 20200511 5.0
15 st123 12345 20200513 8.0
16 st123 12345 20200514 2.0
17 st123 12345 20200515 2.0
18 st123 12345 20200516 2.0
I want to calculate rolling
sum
and avg
of the UNITSUM
column.
The catch here is that I need to calculate it for past 4 days (for example) and not previous 4 records which essentially means, for the 15th
row in my example, the date range to aggregate for is 20200510 - 20200513
. Since there is no entry for 20200512
, we do the aggregation on the 3 available rows and don't include 20200509
in our calculation (like pandas
does in rolling function).
Is there a way to achieve that?
EDIT: I have to achieve this using dask-dataframe
API.