1
votes

I have a dataframe of the form:

df = pd.DataFrame({'Date':['2017-01-01', '2017-02-13', '2018-03-01', '2018-04-01'], 'Value':[1,2,3,4]})

And for each year I have a distinct date range (eg for 2017 from 2017-02-02 to 2017-02-15 and for 2018 from 2018-03-03 to 2018-04-04) stored as a dictionary.

dates_dict = {2017: ('2017-02-02', '2017-02-15'), 2018: ('2018-03-03', '2018-04-04')}

What I want to create is a new column in my dataframe which is True if the Date is within that years date range and False otherwise. For the given example the output would be:

df =    Date        Value  in_range
     0  2017-01-01  1      False
     1  2017-02-13  2      True
     2  2018-03-01  3      False
     3  2018-04-01  4      True

My current solution is:

temp = []
for name, group in df.groupby(df['Date'].dt.year):
    temp.append((group['Date'] >= dates_dict[name][0]) & (group['Date'] <= 
    dates_dict[name][1]))
in_range = pd.concat(temp)
in_range = in_range.rename('in_range')
df = df.merge(in_range.to_frame(), left_index=True, right_index=True)

This works but I'm sure there's a more concise way to achieve this. More generally is there a better way of checking whether a date is within a large list of date ranges?

2

2 Answers

2
votes

Setup

You can make your solution more efficient by converting your dictionary to actually contain a pd.date_range. Both of these solutions assume you make this transformation:

dates_dict = {k: pd.date_range(s, e) for k, (s, e) in dates_dict.items()}

Option 1
Using apply with a dictionary lookup:

df.Date.apply(lambda x: x in dates_dict[x.year], 1)

0    False
1     True
2    False
3     True
Name: Date, dtype: bool

Option 2
A slightly more performant option using a list comprehension:

df['in_range'] = [i in dates_dict[i.year] for i in df.Date]

        Date  Value  in_range
0 2017-01-01      1     False
1 2017-02-13      2      True
2 2018-03-01      3     False
3 2018-04-01      4      True

Timings

In [208]: %timeit df.Date.apply(lambda x: x in dates_dict[x.year], 1)
289 ms ± 5.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [209]: %timeit [i in dates_dict[i.year] for i in df.Date]
284 ms ± 6.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1
votes

you can use map to create a serie ser with the value from you dictionary for each Date and then use between such as:

ser = df.Date.dt.year.map(dates_dict)
df['in_range'] = df.Date.between(pd.to_datetime(ser.str[0]), pd.to_datetime(ser.str[1]))

and you get:

        Date  Value  in_range
0 2017-01-01      1     False
1 2017-02-13      2      True
2 2018-03-01      3     False
3 2018-04-01      4      True