0
votes

I am trying to use MLFlow to log artifacts to Azure Blob Storage. Though the logging to dbfs works fine, when I try to log it to Azure Blob Storage, I only see a folder with the corresponding runid but inside it there are no contents.

Here is what I do-

  1. Create a experiment from Azure Databricks, give it a name and the artifacts location as wasbs://[email protected]/ .

  2. In the spark cluster, in the environemtn Variables section pass on the AZURE_STORAGE_ACCESS_KEY="ValueoftheKey"

  3. In the notebook, use mlflow to log metrics, param and finally the model using a snippet like below

with mlflow.start_run():
      lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
      lr.fit(train_x, train_y)

      predicted_qualities = lr.predict(test_x)

      (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

      print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
      print("  RMSE: %s" % rmse)
      print("  MAE: %s" % mae)
      print("  R2: %s" % r2)

      mlflow.log_param("alpha", alpha)
      mlflow.log_param("l1_ratio", l1_ratio)
      mlflow.log_metric("rmse", rmse)
      mlflow.log_metric("r2", r2)
      mlflow.log_metric("mae", mae)

      mlflow.sklearn.log_model(lr, "model")

Of course before using it , I set the experiment to the one where I have defined the artifacts store to be azure blob storage

experiment_name = "/Users/[email protected]/mltestazureblob"
mlflow.set_experiment(experiment_name)

The metrices and params I can from the MLFlow UI within Databricks but as since my artifacts location is Azure Blob Storage , I expect the model, the .pkl and conda.yaml file to be in the container in the Azure Blob Storage but when I go to check it, I only see a folder corresponding to the run id of the experiment but with nothing inside.

I do not know what I am missing. In case, someone needs additional details I will be happy to provide.

Point to note everything works fine when I use the default location i.e. dbfs.

1
I'm logging from my local machine and I have the same problem with Azure blob, nothing seems to be tracked. Did you manage to solve this and how?bachr

1 Answers

0
votes

Apparently it seems the problem was with Azure Storage Explorer. It does not show the contents of the folder (like the pkl, conda.yaml and the model file). However, when I used the Storage Explorer (preview) from Azure portal, I was able to view the contents (but that is also not very stable it seems).

I will raise a bug for Azure Storage Explorer team for them to take a look at this. I used 1.10.1 version of Azure Storage Explorer.