0
votes

Still not getting the hang of pandas, I am attempting to join two data frames in Pandas using merge. I have read in the CSVs into two data frames (named dropData and deosData in the code below). Both data frames have the column ‘Date_Time’, which is a parsed column of Date and Time information to create a unique id for each entry. The deosData file is an entire year’s worth of observations that I am trying to match up with corresponding entries in dropData.

CSV files:

deosData: https://www.dropbox.com/s/3rr7hf7jzrmxdke/inputDeos.csv?dl=0

dropData: https://www.dropbox.com/s/z9mv4xccjzlsyif/inputDrop.csv?dl=0

I have gone through the documentation for the merge function and have tried the following code in various iterations, so far I have only been able to have a blank data frame with correct header row, or have the two data frames merged on the 0--(N-1) indexing that is assigned by default:

My code:

import pandas as pd
import numpy as np
import os
from matplotlib import pyplot as plt

#read in CSV to dataframe
dropData=pd.read_csv("inputDrop.csv", header=0, index_col=None)
deosData=pd.read_csv("inputDeos.csv", header=0, index_col=None)

#merging dataframes into single sf
merge=pd.merge(dropData,deosData, how='inner', on='Date_Time')
#comment out during debugging
#merge.to_csv('output.csv', sep=',', headers=True, index=False)

#check merge dataframe creation
print merge.head(1)

After searching on SE and the Doc’s I have tried resetting the index, ignoring the index columns, copying the ‘Date_Time’ column as a separate index and trying to merge on the new column, I have tried using ‘on=None’, ‘left_on’ and ‘right_on’ as permutations of ‘Date_Time’ to no avail. I have checked the column data types, ‘Date_Time’ in both are dtype Objects, I do not know if this is the source of the error, since the only issues I could find searching revolved around matching different dtypes to each other.

What I am looking to do is have the two data frames merge where the two 'Date_Time' columns intersect. For example:

    Date_Time,Volume(Max),Volume(Sum),Volume(Min),Volume(Mean),Diameter(Count),Diameter(Max),Diameter(Sum),Diameter(Min),Diameter(Mean),Depth(Sum),Velocity(Max),Velocity(Sum),Velocity(Min),Velocity(Mean), Air Temperature (deg. C), Relative humidity (%), Wind Speed (m.s-1), Wind Direction (deg.), Wind Gust Speed (5) (m.s-1), Barometric Pressure (mbar), Gage Precipitation (5) (mm)
9/1/2014 0:00,2.266188524,2.989272461,0.052464219,0.332141385,9,1.629668,5.972978,0.464467,0.663664222,0.003736591,2.288401,16.889656,1.495487,1.876628444,22.5,99,0,216.1,0.4,1016.2,0

Any help would be greatly appreciated.

2
What is your question? Or issue?Kirell

2 Answers

0
votes

You need to parse_dates when reading csv file, so that Date_Time columns in both dataframes are of pd.Timestamp object instead of raw strings. (if you look at your csv file, one is in ISO format YYYY-MM-DD HH:MM:SS whereas the other is in MM/DD/YYYY HH:MM) Try the following codes:

#read in CSV to dataframe
dropData = pd.read_csv("inputDrop.csv", header=0, index_col=None, parse_dates=['Date_Time'])
deosData = pd.read_csv("inputDeos.csv", header=0, index_col=None, parse_dates=['Date_Time'])

and then do your merge.

0
votes

You can use join, but you first need to set the index:

dropData=pd.read_csv('.../inputDrop.csv', header=0, index_col='Date_Time', parse_dates=True)
deosData=pd.read_csv('.../inputDeos.csv', header=0, index_col='Date_Time', parse_dates=True)
dropData.join(deosData)