Databricks Connect: can't connect to remote cluster on azure, command: 'databricks-connect test' stops

Question

I try to set up Databricks Connect to be able work with remote Databricks Cluster already running on Workspace on Azure. When I try to run command: 'databricks-connect test' it never ends.

I follow official documentation.

I've installed most recent Anaconda in version 3.7. I've created local environment: conda create --name dbconnect python=3.5

I've installed 'databricks-connect' in version 5.1 what matches configuration of my cluster on Azure Databricks.

    pip install -U databricks-connect==5.1.*

I've already set 'databricks-connect configure as follows:

    (base) C:\>databricks-connect configure
    The current configuration is:
    * Databricks Host: ******.azuredatabricks.net
    * Databricks Token: ************************************
    * Cluster ID: ****-******-*******
    * Org ID: ****************
    * Port: 8787

After above steps I try to run 'test' command for databricks connect:

    databricks-connect test

and as a result procedure starts and stops after warning about MetricsSystem as it is visible below:

    (dbconnect) C:\>databricks-connect test
    * PySpark is installed at c:\users\miltad\appdata\local\continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark
    * Checking java version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
    * Testing scala command
    19/05/31 08:14:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    19/05/31 08:14:34 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.

I expect that process should move to next steps like it is in official documentation:

    * Testing scala command
    18/12/10 16:38:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    18/12/10 16:38:50 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
    18/12/10 16:39:53 WARN SparkServiceRPCClient: Now tracking server state for 5abb7c7e-df8e-4290-947c-c9a38601024e, invalidating prev state
    18/12/10 16:39:59 WARN SparkServiceRPCClient: Syncing 129 files (176036 bytes) took 3003 ms
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
          /_/

    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
    Type in expressions to have them evaluated.
    Type :help for more information.

So my process stops after 'WARN MetricsSystem: Using default name SparkStatusTracker'.

What am I doing wrong? Should I configure something more?

It looks like this feature is in private preview so I wonder if that is causing the issue. — Jon
@Jon yes, I confirm: that feature is in preview but I found on internet forums that people use that. My case seems to be some technical problems specific to my configuration but I don't know what should I check/fix. — Miłosz Tadrzak
Oh interesting. I was going to try it myself when I got a chance today. I'll let you know how it goes :) — Jon
Fantastic, please let me know how you dealt with it. I have only my company laptop to test it, so at the same time I've a lot of security restrictions. I wonder how it might behave on another configuration. Good luck Jon. — Miłosz Tadrzak
lots of people seem to be seeing this issue with the test command on Windows. BUt if you try to use Databricks connect it works fine. — simon_dmorias

simon_dmorias simon_dmorias · Accepted Answer · 2019-06-06T11:40:02

Lots of people seem to be seeing this issue with the test command on Windows. But if you try to use Databricks connect it works fine. It seems safe to ignore.

Databricks Connect: can't connect to remote cluster on azure, command: 'databricks-connect test' stops

2 Answers