1
votes

Is it possible to perform a "SELECT" statement query to a SQL server database from an Azure synapse workbook using Pyspark+SQL?

The only way I've been able to ingest data from a SQL Server database into Azure Synapse is by creating an integration pipeline.

I'm new to using Azure Synapse as well as Apache Spark, so any advice you can provide is much appreciated.

1

1 Answers

1
votes

This is possible in theory and I have tested with an Azure SQL Database. I'm not 100% sure it would work with a SQL Server. It would require the network security to be right and there should be a line of sight between the two databases. Is your SQL Server in Azure for example, are they on the same vnet or peered vnets?

A simple example in a Synapse notebook:

import pyodbc

sqlQuery = "select @@version v"

try:

  conn = pyodbc.connect( 'DRIVER={ODBC Driver 17 for SQL Server};'
                        'SERVER=someSynapseDB.sql.azuresynapse.net;'
                        'DATABASE=yourDatabaseName;UID=someReadOnlyUser;'
                        'PWD=youWish;', autocommit = True )

  cursor = conn.cursor()
  cursor.execute(sqlQuery) 

  row = cursor.fetchone()
  while row:
    print(row[0])
    row = cursor.fetchone()

except:
  raise

finally:
  # Tidy up
  cursor.close()
  conn.close()

My results:

results

Inspired by this post by Jovan Popovic:

https://techcommunity.microsoft.com/t5/azure-synapse-analytics/query-serverless-sql-pool-from-an-apache-spark-scala-notebook/ba-p/2250968

Just out of interest is there a particular reason you are doing this in notebooks? Synapse pipelines are a perfectly good way of doing it, and a typical pattern would be to stage the data in a data lake eg is there some special functionality you need to use notebooks for?