1
votes

I'm trying to use workspace environment variables to pass access tokens into my custom cluster init scripts.

It appears that there are only a few supported environment variables that we can access in our custom cluster init scripts as described at https://docs.databricks.com/clusters/init-scripts.html#environment-variables

I've attempted to write to the base cluster configuration using

Microsoft.Azure.Databricks.Client.SparkEnvironmentVariables.Add("WORKSPACE_ID", workspaceId)

My init scripts are still failing to uptake this variable in the following line:

[[ -z "${WORKSPACE_ID}" ]] && LOG_ANALYTICS_WORKSPACE_ID='default' || LOG_ANALYTICS_WORKSPACE_ID="${WORKSPACE_ID}"

With the above lines of code, my init script causes the cluster to fail with the following error:

Spark Error: Spark encountered an error on startup. This issue can be caused by
invalid Spark configurations or malfunctioning init scripts. Please refer to the Spark
driver logs to troubleshoot this issue, and contact Databricks if the problem persists.
Internal error message: Spark error: Driver down

The logs don't say that any part of my bash script is failing, so I'm assuming that it's just failing to pick up the variable from the environment variables.

Has anyone else dealt with a problem with this? I realize that I could write this information to dbfs, and then read it into the init script, but I'd like to avoid doing that since I'll be passing in access tokens. What other approaches can I try?

Thanks for any help!

1
As part of your init script run env , you can direct the output to a known file location, like env >> /dbfs/output.log, or directly check the logs out put from the cluster. This way you can at least see what environment variables are available. Having that result will help debugging your code much easier. - Majid
Also make sure you check logs for both driver and executors. The UI only shows the Driver logs which could not be helpful in some cases. Best would be to ship logs to a bucket, then you will have logs from Driver and all Executors which you can search and understand what is actually happening. - Majid
Did you figure out the solution to this. I have the exact same use-case - Fizi

1 Answers

0
votes

This article shows how to send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the Azure Databricks Monitoring Library, which is available on GitHub.

Prerequisites: Configure your Azure Databricks cluster to use the monitoring library, as described in the GitHub readme.

Steps to build the Azure monitoring library and configure an Azure Databricks cluster:

Step1: Build the Azure Databricks monitoring library

Step2: Create and configure the Azure Databricks cluster

For more details, refer "Monitoring Azure Databricks".

Hope this helps.