42
votes

I have a datetime column as below -

>>> df['ACC_DATE'].head(2)
538   2006-04-07
550   2006-04-12
Name: ACC_DATE, dtype: datetime64[ns]

Now, I want to subtract an year from each row of this column. How can I achieve the same & which library can I use?

The expected field -

        ACC_DATE    NEW_DATE
538   2006-04-07  2005-04-07
549   2006-04-12  2005-04-12
4

4 Answers

82
votes

You can use DateOffset to achieve this:

In[88]:
df['NEW_DATE'] = df['ACC_DATE'] - pd.DateOffset(years=1)
df

Out[88]: 
        ACC_DATE   NEW_DATE
index                      
538   2006-04-07 2005-04-07
550   2006-04-12 2005-04-12
18
votes

Use DateOffset:

df["NEW_DATE"] = df["ACC_DATE"] - pd.offsets.DateOffset(years=1)
print (df)
        ACC_DATE   NEW_DATE
index                      
538   2006-04-07 2005-04-07
550   2006-04-12 2005-04-12
12
votes

You could use pd.Timedelta:

df["NEW_DATE"] = df["ACC_DATE"] - pd.Timedelta(days=365) 

Or replace:

df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x.replace(year=x.year - 1))

But neither will catch leap years so you could use dateutil.relativedelta :

from dateutil.relativedelta import  relativedelta

df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x - relativedelta(years=1))
0
votes

If having a pd.Timestamp object rather than a column,

  1. Using pd.DateOffset(years=n) is not ideal as it produces:

UserWarning: Discarding nonzero nanoseconds in conversion

  1. pd.Timedelta() doesn't accept years.

The only approach that worked for me in this case is pd.Timestamp.replace:

t = pd.Timestamp.now()
t = t.replace(year=t.year - n)

This was hinted at in the answer by Padriac but it needed further clarity.