4
votes

I have import my dataset into h2o flow, I have one column which is categorical type, I wanna convert this into numerical data type.

If I use pandas for this task I'll do like this,

df['category_column'] = df['category_column'].astype('category')
df['category_column'] = df['category_column'].apply(lambda x: x.cat.codes)

How to do this in h2o flow,

I tried following,

  1. while parsing data i changed Data type to numeric from enum but data shows · like this.
  2. I tried convert to numeric option, But it didn't work as I wish.

I don't know whether I'm going in right direction or not. Please help me to solve this issue.

Update on question as suggested:

Why GLM forced me to use numerical column?

Error evaluating cell

My dataset looks like this:

enter image description here

When I use GLM to build model and, I is my response_column i'm getting following error

Error calling POST /3/ModelBuilders/glm with opts {"model_id":"glm-e2ed0066-636c-4c71-bf8...

ERROR MESSAGE: Illegal argument(s) for GLM model: glm-e2ed0066-636c-4c71-bf8c-04525eb05002. Details: ERRR on field: _response: Regression requires numeric response, got categorical. For more information visit: http://jira.h2o.ai/browse/TN-2

2
The accepted answer seems to be showing the opposite of what was asked (numeric to enum, not enum to numeric-of-the-category-codes)? (Though I cannot think of a case where numeric-of-the-category-codes would be better than having it as the enum type, which is maybe why it cannot be done from Flow?)Darren Cook
@DarrenCook - You are right. But when I try to use GLM model it won't accepts enum type. that's why I would like to convert this enum into numeric.Mohamed Thasin ah
Can you show some examples of your data? I wonder if your question is actually: "H2O has mistakenly recognized my numeric column as enum when importing?" (If the data is genuinely categorical then maybe the question should be: "I'm being forced to use a GLM on categorical data, what are my options?")Darren Cook
@DarrenCook - Updated to the question I was using simple IRIS dataset for model.Mohamed Thasin ah

2 Answers

2
votes

To run GLM on categorical data, set the family to "multinomial" (or "binomial" when there are only two classes).

enter image description here

3
votes

if you are using H2O's python api you can convert numeric columns to enum using .asfactor() for example df['my_colummn'] = df['my_colummn'].asfactor()

In flow after you import the dataset you will see a data type drop-down menu next to each column name where you can convert the data type to enum by selecting enum from the drop-down menu. You can also do this after you have parsed the dataset when you view the data; there is a hyperlink within each row that you can click on to convert the data type from numeric to enum.

please see the documentation for more details: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html#parsing-data