I am trying to use MLFlow to log artifacts to Azure Blob Storage. Though the logging to dbfs works fine, when I try to log it to Azure Blob Storage, I only see a folder with the corresponding runid but inside it there are no contents.
Here is what I do-
Create a experiment from Azure Databricks, give it a name and the artifacts location as wasbs://[email protected]/ .
In the spark cluster, in the environemtn Variables section pass on the AZURE_STORAGE_ACCESS_KEY="ValueoftheKey"
- In the notebook, use mlflow to log metrics, param and finally the model using a snippet like below
with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.sklearn.log_model(lr, "model")
Of course before using it , I set the experiment to the one where I have defined the artifacts store to be azure blob storage
experiment_name = "/Users/[email protected]/mltestazureblob"
mlflow.set_experiment(experiment_name)
The metrices and params I can from the MLFlow UI within Databricks but as since my artifacts location is Azure Blob Storage , I expect the model, the .pkl and conda.yaml file to be in the container in the Azure Blob Storage but when I go to check it, I only see a folder corresponding to the run id of the experiment but with nothing inside.
I do not know what I am missing. In case, someone needs additional details I will be happy to provide.
Point to note everything works fine when I use the default location i.e. dbfs.