1
votes

I have a dataframe of patients and operations, with 6 date columns. The dates are in the format of day-month-year. To get the hospital length of stay I need to subtract the admission date [ADMIDATE] from the discharge date (DISDATE). I want to change the date columns to datetime columns.

As an example

    ADMIDATE    DISDATE
0   06/06/2014  07/06/2014
1   23/06/2014  23/06/2014

if use

pd.read_csv('/Users/.......csv', parse_dates=['ADMIDATE', 'DISDATE'])

I get

    ADMIDATE    DISDATE
0   2014-06-06  2014-07-06
1   2014-06-23  2014-06-23

and the 7th June is turned into the 6th July.(DISDATE , row[0] ) If I use the more strict

for col in ['ADMIDATE', 'DISDATE']:
    df[col] = pd.to_datetime(df[col], format='%d/%m/%Y')

it works

ADMIDATE    DISDATE
0   2014-06-06  2014-06-07
1   2014-06-23  2014-06-23

But it won't accept the many empty rows where for instance the patient hasn't yet been discharged at the time of data collection. I can format the date columns in excel to get the csv into year-month-day format and then use parse dates and it works correctly but I would like to know what I can do with to_datetime.

1

1 Answers

2
votes

dayfirst

This argument forces the parser to interpret the day first

df.apply(pd.to_datetime, dayfirst=True)

    ADMIDATE    DISDATE
0 2014-06-06 2014-06-07
1 2014-06-23 2014-06-23

To specify columns:

df[['ADMIDATE', 'DISDATE']].apply(pd.to_datetime, dayfirst=True)

    ADMIDATE    DISDATE
0 2014-06-06 2014-06-07
1 2014-06-23 2014-06-23

error='coerce'

To accommodate missing data:

df[['ADMIDATE', 'DISDATE']].apply(
    pd.to_datetime, dayfirst=True, errors='coerce'
)

    ADMIDATE    DISDATE
0 2014-06-06 2014-06-07
1 2014-06-23 2014-06-23