0
votes

I'm looking for, with no success, how to read a Azure Synapse table from Scala Spark. I found in https://docs.microsoft.com connectors for others Azure Databases with Spark but nothing with the new Azure Data Warehouse.

Does anyone know if it is possible?

2

2 Answers

0
votes

maybe I misunderstood your question, but normally you would use jdbc connection in Spark to use data from remote database

check this doc https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html

keep in mind, Spark would have to ingest data from Synapse tables into memory for processing and perform transformations there, so it is not going to push down operations into Synapse.

Normally, you want to run SQL query against source database and only bring results of SQL into Spark dataframe.

0
votes

It is now directly possible, and with trivial effort (there is even a right-click option added in the UI for this), to read data from a DEDICATED SQL pool in Azure Synapse (the new Analytics workspace, not just the DWH) for Scala (and unfortunately, ONLY Scala right now).

Within Synapse workspace (there is of course a write API as well):

val df = spark.read.sqlanalytics("<DBName>.<Schema>.<TableName>")

If outside of the integrated notebook experience, need to add imports:

 import com.microsoft.spark.sqlanalytics.utils.Constants
 import org.apache.spark.sql.SqlAnalyticsConnector._

It sounds like they are working on expanding to SERVERLESS SQL pool, as well as other SDKs (e.g. Python).

Read top portion of this article as reference: https://docs.microsoft.com/en-us/learn/modules/integrate-sql-apache-spark-pools-azure-synapse-analytics/5-transfer-data-between-sql-spark-pool