Numpy searchsorted
This is the first thing I thought of but it wasn't trivial to iron out. @Vaishali's answer is in spirit very similar to this and simpler. But I'm like a dog with a bone and I can't let it go until I figure it out.
To explain a little bit. searchsorted
will go through an array, In this case the equally spaced dates, and find where in another array they would be placed in order to maintain sortedness. This sounds complicated but if we visualize, we can see what is going on. I'll use letters to demonstrate. I'll choose the letters to correspond with the dates.
x = np.array([*'abdg'])
y = np.array([*'acdef'])
Notice that for each letter in x
I found where the backstop was in y
# i -> 0 0 2 4
# x -> a b d g
# y -> a c d e f
This works out to what I do below.
Setup
df = pd.DataFrame(dict(
date=pd.to_datetime(['2016-03-21', '2016-03-25', '2016-04-10', '2016-05-05']),
value=[10, 10, 10, 10]
))
dates = pd.date_range(df.date.min(), df.date.max(), freq='10D')
Solution
d = df.date.values
v = df.value.values
i = dates.searchsorted(d, side='right') - 1
a = np.zeros(len(dates), dtype=v.dtype)
np.add.at(a, i, v)
pd.DataFrame(dict(
date=dates, value=a
))
date value
0 2016-03-21 20
1 2016-03-31 0
2 2016-04-10 10
3 2016-04-20 0
4 2016-04-30 10
You'll notice I used np.add.at
inorder to sum v
at just the right spots. I could have also done this with np.bincount
. I like the approach above better because np.bincount
casts to float
even though the v
is of type int
.
d = df.date.values
v = df.value.values
i = dates.searchsorted(d, side='right') - 1
pd.DataFrame(dict(
date=dates, value=np.bincount(i, v).astype(v.dtype)
))
date value
0 2016-03-21 20
1 2016-03-31 0
2 2016-04-10 10
3 2016-04-20 0
4 2016-04-30 10