0
votes

I use the h2o deep learning using python on a data of 2 balanced classes "0" and "1", and adjusted the parameters to be as follows:

prostate_dl = H2ODeepLearningEstimator(
     activation=,"Tanh"
     hidden=[50,50,50],
     distribution="multinomial",
    score_interval=10,
    epochs=1000,
    input_dropout_ratio=0.2
    ,adaptive_rate=True
    , rho=0.998, epsilon = 1e-8
    )

prostate_dl .train( 
x=x,
y=y,
training_frame =train,
validation_frame = test) 

Each time the program runs gives different confusion matric and accuarcy results, can anyway explain that? how can the results can be reliable?

Also, all of the runs gives the majority prediction as class "1" not "0" , is their any suggestion?

1
Please move "Also, all of the runs gives the majority prediction as class "1" not "0" , is their any suggestion?" to a separate question (and provide a reproducible example).Erin LeDell

1 Answers

1
votes

This question has already been answered here, but you need to set reproducible=TRUE when you initialize the H2ODeepLearningEstimator in Python (or in h2o.deeplearning() in R).

Even after setting reproducible=TRUE, the H2O Deep Learning results are only reproducible when using a single core; in other words, when h2o.init(nthreads = 1). The reasons behind this are outlined here.

Also, per the H2O Deep Learning user guide:

Does each Mapper task work on a separate neural-net model that is combined during reduction, or is each Mapper manipulating a shared object that’s persistent across nodes?

Neither; there’s one model per compute node, so multiple Mappers/threads share one model, which is why H2O is not reproducible unless a small dataset is used and force_load_balance=F or reproducible=T, which effectively rebalances to a single chunk and leads to only one thread to launch a map(). The current behavior is simple model averaging; between-node model averaging via “Elastic Averaging” is currently in progress.