I have a dataframe that has int and categorical features. The categorical features are 2 types: numbers and strings.
I was able to One hot encode columns that were int and categorical that were numbers. I get an error when I try to One hot encode categorical columns that are strings.
ValueError: could not convert string to float: '13367cc6'
Since the dataframe is huge with high cardinality so I only want to convert it to a Sparse form. I would prefer a solution that uses from sklearn.preprocessing import OneHotEncoder
since I am familiar with it.
I checked other questions too but none of them addresses what I am asking.
data = [[623, 'dog', 4], [123, 'cat', 2],[623, 'cat', 1], [111, 'lion', 6]]
The above dataframe contains 4 rows and 3 columns
Column names - ['animal_id', 'animal_name', 'number']
Assume that animal_id
and animal_name
are stored in pandas as category and number as int64 dtype.