PySpark - Using Spark Connector for SQL Server

Question

Hope you are all doing well.

We are currently exploring options to load SQL Server Tables using PySpark in DataBricks. We have varied sources including files and tables. We are using python as the base as it is easier to link with other existing code base.

Question 01:

We have been recommended to use the Spark Connector to connect to SQL Server (Both on-prem and Cloud) ?

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector

The above link from MS clearly shows that Scala is a dependency. Is it possible to use the above Connector only with Scala ? Can it be used with Python as well ? If so how do we invoke the drivers and methods inside it.

Question 02:

What is the best way to include/import/access libraries, drivers from JAR files or other Maven libraries in python code. In python we normally have a module from which we import the required libraries. Say we have a couple of libraries installed in Databricks using Maven co-ordinates and other stand alone JARs, how to do we access them inside Python scripts.

I hope the above details are sufficient. I thank you all in advance for all the help and suggestions. Cheers...

Mike Ubezzi Mike Ubezzi · Accepted Answer · 2020-05-01T20:49:08

Looks like someone found a solution but without the Databricks context. Please see the following Stack Overflow post: How to use azure-sqldb-spark connector in pyspark

In the meantime, can you please up vote and comment upon the following UserVoice feature request: Implement python bindings for azure-sqldb-spark connector which is currently under review.

For what is currently supported, please see Alberto's answer to the following Srack Overflow post: How to connect Azure SQL Database with Azure Databricks

PySpark - Using Spark Connector for SQL Server

1 Answers