I am finding the safest way to import several dimension and fact tables from SQL Server to Azure Data Lake Gen 2. This is what I found:
Option 1: Azure Data Factory This involves a cost and therefore not preferable solution for me at the moment.
Option 2: Python from Azure Databricks
2a) Apache Spark Connector
jdbcDF = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password).load()
2b) Built-in JDBC Spark SQL Connector
2c) ODBC driver and pyodbc package
2d) pymssql package
2e) JayDeBeApi
Option 3: SSIS package
I am not sure which of these I should use. What are the pros and cons of the above approaches?
Once I read the data into a data frame using one of the above approaches, how do I save them to the Data Lake Gen2 storage ?
jdbcDF.write.parquet('dbfs:/path',mode='overwrite')
this guide is spot on. – Umar.H