I am trying to parse data from a csv file, sort them by date and write the sorted dataframe in a new csv file.
Say we have a very simple csv file with date entries following the pattern day/month/year:
Date,Reference
15/11/2020,'001'
02/11/2020,'002'
10/11/2020,'003'
26/11/2020,'004'
23/10/2020,'005'
I read the csv into a Pandas dataframe. When I attempt to order the dataframe based on the dates in ascending order I expect the data to be ordered as follows:
23/10/2020,'005'
02/11/2020,'002'
10/11/2020,'003'
15/11/2020,'001'
26/11/2020,'004'
Sadly, this is not what I get.
If I attempt to convert the date
to datetime
and then sort, then some date entries are converted to the month/day/year (e.g. 2020-10-23 instead of 2020-23-10) which messes up the ordering:
date reference
2020-02-11 '002'
2020-10-11 '003'
2020-10-23 '005'
2020-11-15 '001'
2020-11-26 '004'
If I sort without converting to datetime
, then the ordering is also wrong:
date reference
02/11/2020 '002'
10/11/2020 '003'
15/11/2020 '001'
23/10/2020 '005'
26/11/2020 '004'
Here is my code:
import pandas as pd
df = pd.read_csv('order_dates.csv',
header=0,
names=['date', 'reference'],
dayfirst=True)
df.reset_index(drop=True, inplace=True)
# df.date = pd.to_datetime(df.date)
df.sort_val
df.sort_values(by='date', ascending=True, inplace=True)
print(df)
df.to_csv('sorted.csv')
Why is sorting by date so hard? Can someone explain why the above sorting attempts fail?
Ideally, I would like the sorted.csv
to have the date entries in the day/month/year format.