How to merge the returned one-hot encoded columns to original dataframe?

Question

I have a banking_dataframe with 21 different columns, one is target, 10 of them are numeric features and 10 of them are categorical features. I have used get_dummies method of pandas to convert categorical data to one-hot encoding. The returned dataframe has 74 columns. Now, I want to merge the encoded dataframe with the original data frame, so my final data should have one-hot encoded values for categorical columns but in the original size of data-frame i.e; 21 columns.

Link to get_dummies function of Pandas:

Code snippet to call get_dummies on categorical features

encoded_features = pd.get_dummies(banking_dataframe[categorical_feature_names])

banking_dataframe.join(pd.get_dummies(banking_dataframe[categorical_feature_names])? — political scientist
I tried both "pd.concat" and "join" strategy, the results are same in both cases. If I explain more, the actual data frame was (41188, 21) in size, now after encoding and concatenating the size of data is (41188, 74), you see dimensions has increased. Don't we need to bring them back to actual size after encoding? Shall I pass the new dimensional data to my model? — Fariha Abbasi

Ajay Dyavathi Ajay Dyavathi · Accepted Answer · 2021-05-28T06:07:07

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# creating a toy data frame to test
df = pd.DataFrame({'Gender': ['M', 'F', 'M', 'M', 'F', 'F', 'F']})

# instantiating and transforming the 'Gender' column of the df
one_hot = OneHotEncoder()
encoded = one_hot.fit_transform(df[['Gender']])

# one_hot object has an attribute 'categories_', which stores the array
# of categories sequentially, and those categories can serve as 
# new columns in our data frame.

df[one_hot.categories_[0]] = encoded.toarray()

How to merge the returned one-hot encoded columns to original dataframe?

2 Answers