0
votes

I want to merge two dataframe on index and want to return only the distinct columns present after merging.

Currently, I am using - pd.merge(X_train, all_data, left_index=True, right_index=True), to merge. But all columns are returned, by appending _x and _y to the end of the column name for identification.

I just need the distinct columns.

Thanks!

1

1 Answers

3
votes

You could try to extract the distinct columns before the merge, and then explicitly pass those to the merge command:

X_train_cols = X_train.columns
all_data_cols = all_data.columns
all_data_cols_new = list(set(all_data_cols).difference(X_train_cols))

Then:

pd.merge(X_train, all_data[all_data_cols_new], left_index=True, right_index=True)