2
votes

Is deep learning model supports multi-label classification problem or any other algorithms in H2O?

Orginal Response Variable -Tags:

apps, email, mail
finance,freelancers,contractors,zen99
genomes
gogovan
brazil,china,cloudflare
hauling,service,moving
ferguson,crowdfunding,beacon
cms,naytev
y,combinator
in,store,
conversion,logic,ad,attribution

After mapping them on the keys of the dictionary: Then

Response variable look like this:

[74]
[156, 89]
[153, 13, 133, 40]
[150]
[474, 277, 113]
[181, 117]
[15, 87, 8, 11]

Thanks

2
It'd be great if you could say what those numbers represent, if the ordering in the array matters, etc. I.e. what you are trying predict. Currently this question really has no extra information than in your other question: stackoverflow.com/q/53684039/841830 - Darren Cook
@DarrenCook updated - Sanjay Kumar

2 Answers

2
votes

No, H2O only contains algorithms that learn to predict a single response variable at a time. You could turn each unique combination into a single class and train a multi-class model that way, or predict each class with a separate model.

1
votes

Any algorithm that creates a model that gives you "finance,freelancers,contractors,zen99" for one set of inputs, and "cms,naytev" for another set of inputs is horribly over-fitted. You need to take a step back and think about what your actual question is.

But in lieu of that, here is one idea: train some word embeddings (or use some pre-trained ones) on your answer words. You could then average the vectors for each set of values, and hope this gives you a good numeric representation of the "topic". You then need to turn your, say, 100 dimensional averaged word vector into a single number (PCA comes to mind). And now you have a single number that you can give to a machine learning algorithm, and that it can predict.

You still have a problem: having predicted a number, how do you turn that number into a 100-dim vector, and from there in to a topic, and from there into topic words? Tricky, but maybe not impossible.

(As an aside, if you turn the above "single number" into a factor, and have the machine learning model do a categorization, to predicting the most similar topic to those it has seen before... you've basically gone full circle and will get a model identical to the one you started with that has too many classes.)