4
votes

I'm struggling to join panda dataframes from csv files that contain UK date format of dd/mm/yyyy.

The data contained within the csv file is:

UK Date     Price
30/12/2015  120
31/12/2015  123
...         ... 
11/01/2016  135
12/01/2016  144

My issue is that the data corresponding to 11/01/2016 is appended to 1st November 2016 on the DataFrame and conversely, the data in the row of 01/11/2016 is appending to 11th January 2016.

Here is my simple code I am using to create a DataFrame within a date range and joining data from a temporary dataframe:

# Define the dates
dates = pd.date_range('2015-12-01', '2018-06-07')

# Create an empty DataFrame
df1 = pd.DataFrame(index = dates)

# Read the data into a temp dataframe
dftemp = pd.read_csv("a.csv", index_col='UK Date', parse_dates = True,
                        usecols = ['UK Date', 'Price'])

# Join the two DataFrames
df1 = df1.join(dftemp, how='inner')

print df1

I am not sure if it is best to try and convert the blank dataframe dates to UK format or to change the format when I read the file in? Also, what is the best way to change the format?

Thanks

1

1 Answers

3
votes

I believe the best is convert to datetimes in read_csv by parameter dayfirst:

df = pd.read_csv("a.csv",
                 index_col='UK Date', 
                 parse_dates = True, 
                 dayfirst=True,
                 usecols = ['UK Date', 'Price'])

Another solutions with to_datetime:

df['UK Date'] = pd.to_datetime(df['UK Date'], dayfirst=True)
#if need DatetimeIndex 
#df.index = pd.to_datetime(df.index, dayfirst=True)

Or:

df['UK Date'] = pd.to_datetime(df['UK Date'], format='%d/%m/%Y')
#if need DatetimeIndex 
#df.index = pd.to_datetime(df.index, format='%d/%m/%Y')