guys. I have a large data set (60k samples with 50 features). One of this features (which is really relevant for me) is job names. There are many jobs names that I'd like to encode to fit in some models, like linear regression or SVCs. However, I don't know how to handle them.
I tried to use pandas dummy variables and Scikit-learn One-hot Encoding, but it generate many features that I may not be encounter on test set. I tried to use the scikit-learn LabelEncoder(), but I also got some errors when I was encoding the variables float() > str() error, for example.
What would you guys recommend me to handle with this several categorical features? Thank you all.