I'm trying to train a model using H2O.ai's H2O-3 Automl Algorithm on AWS SageMaker using the console.
My model's goal is to predict if an arrest will be made based upon the year, type of crime, and location.
My data has 8 columns:
primary_type
: enumdescription
: enumlocation_description
: enumarrest
: enum (true/false), this is the target columndomestic
: enum (true/false)year
: numberlatitude
: numberlongitude
: number
When I use the SageMaker console on AWS and create a new training job using the H2O-3 Automl Algorithm, I specify the primary_type
, description
, location_description
, and domestic
columns as categorical.
However in the logs of the training job I always see the following two lines:
Converting specified columns to categorical values:
[]
This leads me to believe the categorical_columns
attribute in the training
hyperparameter is not being taken into account.
I have tried the following hyperparameters with the same output in the logs each time:
{'classification': 'true', 'categorical_columns':'primary_type,description,location_description,domestic', 'target': 'arrest'}
{'classification': 'true', 'categorical_columns':['primary_type','description','location_description','domestic'], 'target': 'arrest'}
I thought the list of categorical columns was supposed to be delimited by comma, which would then be split into a list.
I expected the list of categorical column names to be output in the logs instead of an empty list, like so:
Converting specified columns to categorical values:
['primary_type','description','location_description','domestic']
Can anyone help me figure out how to get these categorical columns to apply to the training of my model?
Also- I think this is the code that's running when I train my model but I have yet to confirm that: https://github.com/h2oai/h2o3-sagemaker/blob/master/automl/automl_scripts/train#L93-L151