I'm wondering how to automatically tune my scikit learn random forest model with Amazon Sagemaker. For now, I would like to tune a single hyperparameter called "max_depth". I'll dump my code first and express some concerns after.
FILE: notebook.ipynb
estimator = sagemaker.estimator.Estimator(image, role,
train_instance_count=1,
train_instance_type='ml.m4.xlarge',
output_path=output_location,
sagemaker_session=sagemaker_session,
)
hyperparameter_ranges = {'max_depth': IntegerParameter(20, 30)}
objective_metric_name = 'score'
metric_definitions = [{'Name': 'score', 'Regex': 'score: ([0-9\\.]+)'}]
tuner = HyperparameterTuner(estimator,
objective_metric_name,
hyperparameter_ranges,
metric_definitions,
max_jobs=9,
max_parallel_jobs=3)
tuner.fit({'train': train_data_location, 'test': test_data_location})
FILE: train (located in docker container)
def train():
with open(param_path, 'r') as tc:
hyperparams = json.load(tc)
print("DEBUG VALUE: ", hyperparams)
data, class = get_data() #abstraction
X, y = train_data.drop(['class'], axis=1), train_data['class']
clf = RandomForestClassifier()
clf.fit(data, class)
print("score: " + str(evaluate_model(clf)) + "\n")
I see two issues with this code. First, If I put a json object {'max_value':2} in a file named hyperparameters.json at the necessary path, the print statement outputs {} as if the file is empty.
Issue number 2 is the fact that train() does not allow for hyperparameters to affect the code in any way shape or form. As far as I can tell, amazon has no documentation on the inner workings of the tuner.fit() method. This means I can't figure out how train() accesses the hyperparameters to test.
Any help is appreciated, let me know if I can provide more code or clarify anything.