I'm looking for a way to do something like the various rolling_*
functions of pandas
, but I want the window of the rolling computation to be defined by a range of values (say, a range of values of a column of the DataFrame), not by the number of rows in the window.
As an example, suppose I have this data:
>>> print d
RollBasis ToRoll
0 1 1
1 1 4
2 1 -5
3 2 2
4 3 -4
5 5 -2
6 8 0
7 10 -13
8 12 -2
9 13 -5
If I do something like rolling_sum(d, 5)
, I get a rolling sum in which each window contains 5 rows. But what I want is a rolling sum in which each window contains a certain range of values of RollBasis
. That is, I'd like to be able to do something like d.roll_by(sum, 'RollBasis', 5)
, and get a result where the first window contains all rows whose RollBasis
is between 1 and 5, then the second window contains all rows whose RollBasis
is between 2 and 6, then the third window contains all rows whose RollBasis
is between 3 and 7, etc. The windows will not have equal numbers of rows, but the range of RollBasis
values selected in each window will be the same. So the output should be like:
>>> d.roll_by(sum, 'RollBasis', 5)
1 -4 # sum of elements with 1 <= Rollbasis <= 5
2 -4 # sum of elements with 2 <= Rollbasis <= 6
3 -6 # sum of elements with 3 <= Rollbasis <= 7
4 -2 # sum of elements with 4 <= Rollbasis <= 8
# etc.
I can't do this with groupby
, because groupby
always produces disjoint groups. I can't do it with the rolling functions, because their windows always roll by number of rows, not by values. So how can I do it?