115
votes

I want to subtract dates in 'A' from dates in 'B' and add a new column with the difference.

df
          A        B
one 2014-01-01  2014-02-28 
two 2014-02-03  2014-03-01

I've tried the following, but get an error when I try to include this in a for loop...

import datetime
date1=df['A'][0]
date2=df['B'][0]
mdate1 = datetime.datetime.strptime(date1, "%Y-%m-%d").date()
rdate1 = datetime.datetime.strptime(date2, "%Y-%m-%d").date()
delta =  (mdate1 - rdate1).days
print delta

What should I do?

4

4 Answers

112
votes

Assuming these were datetime columns (if they're not apply to_datetime) you can just subtract them:

df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])

In [11]: df.dtypes  # if already datetime64 you don't need to use to_datetime
Out[11]:
A    datetime64[ns]
B    datetime64[ns]
dtype: object

In [12]: df['A'] - df['B']
Out[12]:
one   -58 days
two   -26 days
dtype: timedelta64[ns]

In [13]: df['C'] = df['A'] - df['B']

In [14]: df
Out[14]:
             A          B        C
one 2014-01-01 2014-02-28 -58 days
two 2014-02-03 2014-03-01 -26 days

Note: ensure you're using a new of pandas (e.g. 0.13.1), this may not work in older versions.

137
votes

To remove the 'days' text element, you can also make use of the dt() accessor for series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html

So,

df[['A','B']] = df[['A','B']].apply(pd.to_datetime) #if conversion required
df['C'] = (df['B'] - df['A']).dt.days

which returns:

             A          B   C
one 2014-01-01 2014-02-28  58
two 2014-02-03 2014-03-01  26
12
votes

A list comprehension is your best bet for the most Pythonic (and fastest) way to do this:

[int(i.days) for i in (df.B - df.A)]
  1. i will return the timedelta(e.g. '-58 days')
  2. i.days will return this value as a long integer value(e.g. -58L)
  3. int(i.days) will give you the -58 you seek.

If your columns aren't in datetime format. The shorter syntax would be: df.A = pd.to_datetime(df.A)

1
votes

How about this:

times['days_since'] = max(list(df.index.values))  
times['days_since'] = times['days_since'] - times['months']  
times