0
votes

I am trying to create an ML application in which a front end takes user information and data, cleans it, and passes it to h2o AutoML for modeling, then recovers and visualizes the results. Since the back end will be a stand-alone / always-on service that gets called many times, I want to ensure that all objects created in each session are removed, so that h2o doesn't get cluttered and run out of resources. The problem is that many objects are being created, and I am unsure how to identify/track them, so that I can remove them before disconnecting each session.

Note that I would like the ability to run more than one analysis concurrently, which means I cannot just call remove_all(), since this may remove objects still needed by another session. Instead, it seems I need a list of session objects, which I can pass to the remove() method. Does anyone know how to generate this list?

Here's a simple example:

import h2o
import pandas as pd

df = pd.read_csv("C:\iris.csv")
my_frame = h2o.H2OFrame(df, "my_frame")

aml = H2OAutoML(max_runtime_secs=100)
aml.train(y='class', training_frame=my_frame)

Looking in the Flow UI shows that this simple example generated 5 new frames, and 74 models. Is there a session ID tag or something similar that I can use to identify these separately from any objects created in another session, so I can remove them?

Frames Created

Models Created

2
Did you try h2o.remove(aml)? This should delete the automl instance on backend and cascade to all the submodels. It won't delete the training frame though.Seb
@Seb, I thought I had tried this already, but maybe there was old data still there. When I tried it again, it worked! Please post as an answer so I can approve. Much appreciated...Helenus the Seer

2 Answers

1
votes

The recommended way to clean only your work is to use h2o.remove(aml). This will delete the automl instance on the backend and cascade to all the submodels and attached objects like metrics. It won't delete the frames that you provided though (e.g. training_frame).

1
votes

You can use h2o.ls() to list the H2O objects. Then you can use h2o.remove('YOUR_key') to remove ones you don't want to keep.

For example:

#Create frame of objects
h_objects = h2o.ls()
#Filter for keys of one AutoML session
filtered_objects = h_objects[h_objects['key'].str.contains('AutoML_YYYYMMDD_xxxxxx')]
for key in filtered_objects['key']:
    h2o.remove(key)

Alternatively, you can remove all AutoML objects using the filter below instead.

filtered_objects = h_objects[h_objects['key'].str.lower().str.contains('automl')]