1
votes

I am currently using H2O's AutoML for a data science project. However, nowhere in the documentation or on the internet or in the code I can find how AutoML treats factor variables - does it do one-hot encoding? Label encoding? Something more advanced? Does it consider how many levels there are? Does it depend on the algorithm?

Currently, AutoML performs really badly (marginally above the baseline), and I suspect it's because it doesn't treat categoricals right, which make up about 90% of my predictors.

1

1 Answers

1
votes

AutoML automatically runs the supervised learning models that are available in H2O-3. So how AutoML handles categoricals depends on the default categorical handling of the given model it is running. Documentation on the handling of categoricals can be found here, if you are interested in a particular algorithm use the same documentation to find your algorithm of interest and review details of how it handles categorical values or use the Python or R API documentation to look up the default values.