
I have import my dataset into h2o flow, I have one column which is categorical type, I wanna convert this into numerical data type.

If I use pandas for this task I'll do like this,

df['category_column'] = df['category_column'].astype('category')
df['category_column'] = df['category_column'].apply(lambda x: x.cat.codes)

How to do this in h2o flow,

I tried following,

  1. while parsing data i changed Data type to numeric from enum but data shows · like this.
  2. I tried convert to numeric option, But it didn't work as I wish.

I don't know whether I'm going in right direction or not. Please help me to solve this issue.

Update on question as suggested:

Why GLM forced me to use numerical column?

Error evaluating cell

My dataset looks like this:

enter image description here

When I use GLM to build model and, I is my response_column i'm getting following error

Error calling POST /3/ModelBuilders/glm with opts {"model_id":"glm-e2ed0066-636c-4c71-bf8...

ERROR MESSAGE: Illegal argument(s) for GLM model: glm-e2ed0066-636c-4c71-bf8c-04525eb05002. Details: ERRR on field: _response: Regression requires numeric response, got categorical. For more information visit: http://jira.h2o.ai/browse/TN-2

The accepted answer seems to be showing the opposite of what was asked (numeric to enum, not enum to numeric-of-the-category-codes)? (Though I cannot think of a case where numeric-of-the-category-codes would be better than having it as the enum type, which is maybe why it cannot be done from Flow?)Darren Cook
@DarrenCook - You are right. But when I try to use GLM model it won't accepts enum type. that's why I would like to convert this enum into numeric.Mohamed Thasin ah
Can you show some examples of your data? I wonder if your question is actually: "H2O has mistakenly recognized my numeric column as enum when importing?" (If the data is genuinely categorical then maybe the question should be: "I'm being forced to use a GLM on categorical data, what are my options?")Darren Cook
@DarrenCook - Updated to the question I was using simple IRIS dataset for model.Mohamed Thasin ah

2 Answers


To run GLM on categorical data, set the family to "multinomial" (or "binomial" when there are only two classes).

enter image description here


if you are using H2O's python api you can convert numeric columns to enum using .asfactor() for example df['my_colummn'] = df['my_colummn'].asfactor()

In flow after you import the dataset you will see a data type drop-down menu next to each column name where you can convert the data type to enum by selecting enum from the drop-down menu. You can also do this after you have parsed the dataset when you view the data; there is a hyperlink within each row that you can click on to convert the data type from numeric to enum.

please see the documentation for more details: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html#parsing-data