0
votes

I am pretty new to machine learning and I am currently dealing with a dataset in the format of a csv file comprised of categorical data. As a means of preprocessing, I One Hot Encoded all the variables in my dataset.

At the moment I am trying to apply a random forest algorithm to classify the entries into one of the 4 classes. My problem is that I do not understand exactly what happens to these One Hot Encoded variables. How do I feed them to the algorithm? Is it able to make the difference between buying_price_high, buying_price_low (One Hot Encoded from buying_price)?

I One Hot Encoded the response variable as well.

2
What do you mean when you say OHEd ? - Joseph Budin
One Hot Encoded - Beatrice

2 Answers

0
votes

Method of (One Hot Encoder) applies to category variables, and category variables have no size relationship.For the price variable,I suggest you use OrinalEncoder.Sklearn is a good package for machine.like, sklearn learning.preprocessing.OneHotEncoder or sklearn.preprocessing.OrdinalEncoder

0
votes

I guess you're having problem understanding One Hot Encoder. Lets suppose you've 4 classes what one hot encoder will do it will convert those labels into binary numbers whereas LabelEncoder will give them labels as 0,1,2,3 and so on. It is better to use One Hot encoder because ML models will give higher weightage to label 3 than label 2.

Using Label Encoder

One Hot encoder