0
votes

I have two h2o frames. The both h2o frames have common site_id and timestamp columns. I need to merge these frames by applying left join. site_id column is in type of int whereas timestamp is in type of time. I confirm that when I run the describe() command.

df = h2o.H2OFrame.merge(df1, df2, by_x = ["site_id", "timestamp"], by_y=["site_id", "timestamp"])
df.head()

This returns the following error.

H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Merging columns must be the same type, column building_id found types Time and Numeric Request: POST /99/Rapids data: {'ast': "(tmp= py_7_sid_aff9 (merge py_4_sid_aff9 weather_train.hex False False [2 4] [0 1] 'auto'))", 'session_id': '_sid_aff9'}

The data set I am using can be accessed from the link https://www.kaggle.com/c/ashrae-energy-prediction

1
h2o version is '3.26.0.3'johncasey

1 Answers

0
votes

I've found a workaround. I firstly join two h2o frames with the first column. This causes to duplicate columns. After then, I filter just rows have same timestamp value in t1 and t2 columns.

train_meta_df.columns[2] = "t1" #rename timestamp column
weather_train_df.columns[1] = "t2"

df = h2o.H2OFrame.merge(df1, df2, by_x = ["site_id"], by_y=["site_id"])
df = df[df["t1"] == df["t2"]]

Still, I believe this is a bug to be fixed.