import table data and save as json documents in adls gen2 using databricks

Question

I am using the below code to generate json resultset from a sql server table.

Powershell:

$InstanceName = "SQLTEST1\ENG_TST1"
$connectionString = "Server=$InstanceName;Database=dbadb;Integrated Security=True;"

$query = "SELECT * FROM dbo.sales"

$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString

$connection.Open()
$command = $connection.CreateCommand()
$command.CommandText = $query

$result = $command.ExecuteReader()

$table = new-object "System.Data.DataTable"

$table.Load($result)

$table | select $table.Columns.ColumnName | ConvertTo-Json

$connection.Close()

Could you please guide me to store json documents in Azure Data Lake Storage Gen2 using Azure Databricks.

Jim Xu Jim Xu · Accepted Answer · 2020-08-04T07:01:00

If you want to save file to Azure data lake gen2 in Azure databricks, please refer to the following steps

Create an Azure Data Lake Storage Gen2 account.

az login
az storage account create \
    --name <account-name> \
    --resource-group <group name> \
    --location westus \
    --sku Standard_RAGRS \
    --kind StorageV2 \
    --enable-hierarchical-namespace true

Create a service principal and assign Storage Blob Data Contributor to the sp in the scope of the Data Lake Storage Gen2 storage account

az login

az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
    --scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>

Create a Spark cluster in Azure Databricks
mount Azure data lake gen2 in Azure databricks(python)

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "<appId>",
       "fs.azure.account.oauth2.client.secret": "<clientSecret>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)

save json to azure data lake gen2

dbutils.fs.put("/mnt/flightdata/<file name>", """
<json string>
""", True)

import table data and save as json documents in adls gen2 using databricks

2 Answers