2
votes

I want to try out cluster scoped init scripts on a Azure Databricks cluster. I'm struggling to see which commands are available.

Basically, I've got a file on dbfs that I want to copy to a local directory /tmp/config when the cluster spins up.

So I created a very simple bash script:

#!/bin/bash
mkdir - p /tmp/config
databricks fs cp dbfs:/path/to/myFile.conf /tmp/config

Spinning up the cluster fails with "Cluster terminated. Reason: Init Script Failure". Looking at the log on dbfs, I see the error

bash: line 1: databricks: command not found

OK, so databricks as a command is not available. That's the command I use on the local bash to copy files from and to dbfs.

What other commands are available to copy a file from dbfs? And more general: Which commands are actually available?

2
Try compgen -a. - stephanmg
Thanks, didn't know compgen. Problem is that I can't see the output of the init script. Tried a redirect, in the init script I have compgen -a >&2 but no output is written to the error log. - pgruetter
Did you ever fix this? Same issue here. - Union find
No, unfortunately I didn't. We changed our application anyway so the configuration file I tried to copy, doesn't need to be in the classpath anymore. So now I don't need an init script at all. - pgruetter

2 Answers

2
votes

The dbfs is mounted to the clusters, so you can just copy it in your shell script:

e.g.

cp /dbfs/your-folder/your-file.txt ./your-file-txt

If you do a dir on the /dbfs location you get as a return all the folders/data you have in your dbfs.

You can also first test it in a notebook via

%sh
cd /dbfs
dir
1
votes

By default, Databricks CLI is not installed on the databricks cluster. That's the reason you see this error message bash: line 1: databricks: command not found.

To achieve this, you should use dbutils commands as shown below.

dbutils.fs.mkdirs("/tmp/config")
dbutils.fs.mv("/configuration/proxy.conf", "/tmp/config")

enter image description here

Reference: Databricks Utilities

Hope this helps.