24
votes

I have a sql file which consists of the data below which I read into pandas.

df = pandas.read_sql('Database count details', con=engine,
                     index_col='id', parse_dates='newest_available_date')

Output

id       code   newest_date_available
9793708  3514   2015-12-24
9792282  2399   2015-12-25
9797602  7452   2015-12-25
9804367  9736   2016-01-20
9804438  9870   2016-01-20

The next line of code is to get last week's date

date_before = datetime.date.today() - datetime.timedelta(days=7) # Which is 2016-01-20

What I am trying to do is, to compare date_before with df and print out all rows that is less than date_before

if (df['newest_available_date'] < date_before): print(#all rows)

Obviously this returns me an error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How should I do this?

2
Also ensure that your column is datetime type. Use df.dtypessushmit

2 Answers

37
votes

I would do a mask like:

a = df[df['newest_date_available'] < date_before]

If date_before = datetime.date(2016, 1, 19), this returns:

        id  code newest_date_available
0  9793708  3514            2015-12-24
1  9792282  2399            2015-12-25
2  9797602  7452            2015-12-25
14
votes

Using datetime.date(2019, 1, 10) works because pandas coerce the date to a date time under the hood. This however, will no longer be the case in future versions of pandas.

From version 0.24 and up, it now issue a warning:

FutureWarning: Comparing Series of datetimes with 'datetime.date'. Currently, the 'datetime.date' is coerced to a datetime. In the future pandas will not coerce, and a TypeError will be raised.

The better solution is the one proposed on its official documentation as Pandas replacement for python datetime.datetime object.

To provide an example referencing OP's initial dataset, this is how you would use it:

import pandas
cond1 = df.newest_date_available < pd.Timestamp(2016,1,10)
df.loc[cond1, ]