How to hot encode a dataframe column with multiple strings?

Question

I am currently working on building a regressor model to predict the food delivery time.

This is the dataframe with a few observation

If you observe the Cuisines column has many strings. Used the code

pd.get_dummies(data.Cuisines.str.split(',',expand=True),prefix='c')

This helped me split the strings and hot encode, however, there is a new issue to be dealt with.

Merged the dataframe and dummies. fastfood appears in 1st and 3rd rows. Expected output was a single fastfood column with value 1 on first and third rows, however, there are two fastfood columns are created. fastfood(4th column) is created for first row and fastfood(15th column) for thrid row.

Can someone help me solve this help me get a single fastfood column with value 1 on first and third rows and similarly for the other cuisines too.

It still is the same. This code again creates two different fastfood columns. — Ranjini

Quang Hoang Quang Hoang · Accepted Answer · 2019-12-03T15:54:17

The two Fast Food are different by a trailing space. You probably want to try:

data.Cuisines.str.get_dummies(',\s*')

How to hot encode a dataframe column with multiple strings?

1 Answers