I am trying to create an ML application in which a front end takes user information and data, cleans it, and passes it to h2o AutoML for modeling, then recovers and visualizes the results. Since the back end will be a stand-alone / always-on service that gets called many times, I want to ensure that all objects created in each session are removed, so that h2o doesn't get cluttered and run out of resources. The problem is that many objects are being created, and I am unsure how to identify/track them, so that I can remove them before disconnecting each session.
Note that I would like the ability to run more than one analysis concurrently, which means I cannot just call remove_all(), since this may remove objects still needed by another session. Instead, it seems I need a list of session objects, which I can pass to the remove() method. Does anyone know how to generate this list?
Here's a simple example:
import h2o
import pandas as pd
df = pd.read_csv("C:\iris.csv")
my_frame = h2o.H2OFrame(df, "my_frame")
aml = H2OAutoML(max_runtime_secs=100)
aml.train(y='class', training_frame=my_frame)
Looking in the Flow UI shows that this simple example generated 5 new frames, and 74 models. Is there a session ID tag or something similar that I can use to identify these separately from any objects created in another session, so I can remove them?
h2o.remove(aml)
? This should delete the automl instance on backend and cascade to all the submodels. It won't delete the training frame though. – Seb