Embarrassingly parallel hyperparameter search via Azure + DataBricks + MLFlow

Question

Conceptual question. My company is pushing Azure + DataBricks. I am trying to understand where this can take us.

I am porting some work I've done locally to the Azure + Databricks platform. I want to run an experiment with a large number of hyperparameter combinations using Azure + Databricks + MLfLow. I am using PyTorch to implement my models.

I have a cluster with 8 nodes. I want to kick off the parameter search across all of the nodes in an embarrassingly parallel manner (one run per node, running independently). Is this as simple as creating a MLflow project and then using the mlflow.projects.run command for each hyperparameter combination and Databricks + MLflow will take care of the rest?

Is this technology capable of this? I'm looking for some references I could use to make this happen.

I ended up switching to the Azure machine learning Python SDK. — danelliottster

Daniel Daniel · Accepted Answer · 2020-07-17T11:52:54

The short answer is yes, it's possible, but won't be exactly as easy as running a single mlflow command. You can paralelize single-node workflows using spark Python UDFs, a good example of this is this notebook

I'm not sure if this will work with pytorch, but there is hyperopt library that lets you parallelize search across parameters using Spark - it's integrated with mlflow and available in databricks ML runtime. I've been using it only with scikit-learn, but it may be worth checking out

Embarrassingly parallel hyperparameter search via Azure + DataBricks + MLFlow

1 Answers