python - Causal resampling: Sum over the last X <time_unit>

Question

Say I have the following values:

                                   money_spent
time                 
2014-10-06 17:59:40.016000-04:00      1.832128
2014-10-06 17:59:41.771000-04:00      2.671048
2014-10-06 17:59:43.001000-04:00      2.019434
2014-10-06 17:59:44.792000-04:00      1.294051
2014-10-06 17:59:48.741000-04:00      0.867856

I am hoping to measure much money is spent every 2 seconds. More specifically, for every timestamp in the output, I need to see the money spent within the last 2 seconds.

When I do:

df.resample('2S', how='last')

I get:

                                money_spent
time               
2014-10-06 17:59:40-04:00          2.671048
2014-10-06 17:59:42-04:00          2.019434
2014-10-06 17:59:44-04:00          1.294051
2014-10-06 17:59:46-04:00               NaN
2014-10-06 17:59:48-04:00          0.867856

which is not what I would expect. To start with, note that the first entry in the resampled df is 2.671048, but that is at time 17:59:40, even though, according to the original dataframe, no money was spent yet. How is that possible?

Nasser Al-Wohaibi Nasser Al-Wohaibi · Accepted Answer · 2014-10-08T02:13:59

try using how=np.sum :

df.resample('2S', how=np.sum, closed='left', label='right')

Edit:

As for closed and label:

It means: from the left-closed interval, and labeled with the date from the right end of the interval. (of 2 seconds e.g. [1, 1.2, 1.5, 1.9, 2) ) . And from the docs:

closed : {‘right’, ‘left’} Which side of bin interval is closed

label : {‘right’, ‘left’} Which bin edge label to label bucket with

python - Causal resampling: Sum over the last X <time_unit>

3 Answers