Ways to import data from SQL Server 2016 to Azure Data Lake Gen 2

Question

I am finding the safest way to import several dimension and fact tables from SQL Server to Azure Data Lake Gen 2. This is what I found:

Option 1: Azure Data Factory This involves a cost and therefore not preferable solution for me at the moment.

Option 2: Python from Azure Databricks

2a) Apache Spark Connector

jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", table_name) \
        .option("user", username) \
        .option("password", password).load()

2b) Built-in JDBC Spark SQL Connector

2c) ODBC driver and pyodbc package

2d) pymssql package

2e) JayDeBeApi

Option 3: SSIS package

I am not sure which of these I should use. What are the pros and cons of the above approaches?

Once I read the data into a data frame using one of the above approaches, how do I save them to the Data Lake Gen2 storage ?

use the jdbc driver, you need to pass in the correct parameters to connect to the sql db, it's usually best to store these as secrets in keyvault or using databricks secrets via the CLI. once you've read them in via the JDBC driver you can save them as parquet files jdbcDF.write.parquet('dbfs:/path',mode='overwrite') this guide is spot on. — Umar.H
As Manakin said, this is a good solution. May I post it as an answer to close this issue? — Joseph Xu

Joseph Xu Joseph Xu · Accepted Answer · 2020-12-17T08:57:45

We can use the jdbc driver to read the data into a data frame. For safety reasons, we can store the connection string info as secrets in Azure Keyvault or using databricks secrets via the CLI.
We can save them as parquet files and upload to the ADL v2 via jdbcDF.write.parquet('dbfs:/path',mode='overwrite').

Ways to import data from SQL Server 2016 to Azure Data Lake Gen 2

1 Answers