4
votes

I'd like to edit Databricks notebooks locally using my favorite editor, and then use Databricks Connect to run the notebook remotely on a Databricks cluster that I usually access via the web interface.

Unfortunately, after searching the web for a couple days, I can't find detailed documentation on Databricks Connect.

I run databricks-connect configure, as suggested on the PyPI page above, but I'm not sure what some of the settings are. Could someone please walk me through this (like where to find these values in the web interface) or provide a link to proper documentation?

I know what some of the settings should be, but I'll include everything that comes up when running databricks-connect configure, for completeness and benefit of others.

Databricks Host
Databricks Token
Cluster ID (e.g., 0921-001415-jelly628)
Org ID (Azure-only, see ?o=orgId in URL)
Port (is it spark.databricks.service.port ?)

Also, and I think it's what I'm most interested in, do I need to make any changes in the notebook itself, such as define SparkContext or something? If so, with what configuration?

And how should I run it? After running databricks-connect configure, there doesn't seem any "magic" to be happening. When I run jupyter notebook, it still runs locally and doesn't seem to know to forward it to a remote cluster.

Update: If you'd like to think of something more concrete, in Databricks' web interface, dbutils is a predefined object. How do I refer to it when running a notebook remotely?

2
Ideally you want to stick to a single question at a time. This looks like you are asking for a walk-through or tutorial, and SO is not a great source for that. It is unlikley you are going to get much traction here. - user1531971
I guess... But if there's a good documentation somewhere, I'd be pretty happy with just a link. - Arseny
If I had to choose one question, it would be "Do I need to change anything in the notebook to be able to run it?" I think I could work my way through it from a starting point like that. - Arseny
Unfortunately, that isn't really how SO works. Basically, this project looks like a thin wrapper around the Azure API, so your starting guess would be that the config refers directly back to that. - user1531971
It's currently in private preview so no documents available. You can try going through your Microsoft account manager to get on the preview. - simon_dmorias

2 Answers

3
votes

I had marked another person's reply as the answer, but that reply is gone now for some reason.

For my purposes, the official user guide worked: https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html

1
votes

In short you will need to include:

spark = SparkSession.builder.getOrCreate()

At the start of scripts. Notebooks should convert, but of course magic commands (%run etc) will not work.

More detail is available here on the parts that will not work. https://datathirst.net/blog/2019/3/7/databricks-connect-finally