1
votes

I'm performing a gridsearch for GBM in h2o for a continuous outcome with continuous predictors. I'm using cross validation for training and then predict on a test set.

I'm using the function .predict_leaf_node_assignment:

best_gbm.predict_leaf_node_assignment(test_frame_h2o) (where best_gbm is the best gbm model I got from gridsearch)

and get the following table where we can see the leaf node assignments per tree T1, T2, T3 etc.

table

Question 1:

How can I get the values of T1, T2, T3 etc. per leaf in the below table and not the location of the leaf?

Question 2:

If there is a way to get the values for T1, T2, T3 etc. what do they actually reflect? Is the T1 the first prediction and then T2, T3, T4 are the corrections? Or T1 is the prediction and then T2 is T1 corrected etc.?

Thanks.

Edit: I tried to download mojo in python as explained in this page so that I can look into the different trees. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html?highlight=mojo

In "Step 2: Compile and run the MOJO" the 2nd part of this step is given only in R: "Create your main program in the experiment folder by creating a new file called main.java (for example, using “vim main.java”). Include the following contents. Note that this file references the GBM model created above using R."

Can I do this in python? I have tried to copy for example the command "import java.io.*" in the jupyter notebook but it throws an error (ModuleNotFoundError: No module named 'java').

1

1 Answers

0
votes

T1, T2,...TN correspond to the first tree built, second tree built,....final tree you built. (so if you said ntrees =50 you should see T1 - T50). If you were doing a multi-class classification problem you would see each tree appended with the class for example T1.C1 T1.C2 (where C1 is class one).

In your posted image it looks like you passed in an H2OFrame with 10 or more rows. Try looking at a single row, you will see that you get a single row frame back - because predict_leaf_node_assignment gives you the path a row took to get to the final leaf node (aka terminal node) for each tree.

Take your T1 (first tree) for example, and first row in the frame you passed. We see that the path was RRR, this means the row was funneled right at each split.

Question 1: if you are asking for the actual prediction value at T1 for a given row in your data that would require downloading a mojo and scoring that row with the mojo (this has to be done with Java). Please note that the predicted value for that tree will actually be in the link space and you will need to take the corresponding inverse link function to get the original response value back. The link function used will be specified in the mojo.

(in response to your EDIT: no you cannot take the example code (which is Java) and paste it into a jupyter notebook because the code is java not python - the walkthrough assumes you have java on your machine (1.7 or greater) and expects you to run the code from your terminal or command prompt.