1
votes

I am using the below code to generate json resultset from a sql server table.

Powershell:

$InstanceName = "SQLTEST1\ENG_TST1"
$connectionString = "Server=$InstanceName;Database=dbadb;Integrated Security=True;"

$query = "SELECT * FROM dbo.sales"

$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString

$connection.Open()
$command = $connection.CreateCommand()
$command.CommandText = $query

$result = $command.ExecuteReader()

$table = new-object "System.Data.DataTable"

$table.Load($result)

$table | select $table.Columns.ColumnName | ConvertTo-Json

$connection.Close()

Could you please guide me to store json documents in Azure Data Lake Storage Gen2 using Azure Databricks.

2

2 Answers

2
votes

If you want to save file to Azure data lake gen2 in Azure databricks, please refer to the following steps

  1. Create an Azure Data Lake Storage Gen2 account.
az login
az storage account create \
    --name <account-name> \
    --resource-group <group name> \
    --location westus \
    --sku Standard_RAGRS \
    --kind StorageV2 \
    --enable-hierarchical-namespace true
  1. Create a service principal and assign Storage Blob Data Contributor to the sp in the scope of the Data Lake Storage Gen2 storage account
az login

az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
    --scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
  1. Create a Spark cluster in Azure Databricks

  2. mount Azure data lake gen2 in Azure databricks(python)

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "<appId>",
       "fs.azure.account.oauth2.client.secret": "<clientSecret>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)
  1. save json to azure data lake gen2
dbutils.fs.put("/mnt/flightdata/<file name>", """
<json string>
""", True)

enter image description here

enter image description here

2
votes

You can use df.write.json API to write to any specific location as per your need.

Syntax:df.write.json('location where you want to save the json file')

Example:df.write.json("abfss://<file_system>@<storage-account-name>.dfs.core.windows.net/iot_devices.json")

Here are the steps to save the JSON documents to Azure Data Lake Gen2 using Azure Databricks.

Step1: You can use spark.read.json API to read the json file and create a dataframe.

Step2: The blob storage location can be mounted to a databricks dbfs directory, using the instructions in below doc

https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2

Step3: Then use the df.write.json API to write to the mount point, which will write to the blob storage

For more details, refer the below articles:

Azure Databricks – JSON files

Sample notebook: https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/adls-passthrough-gen2.html

enter image description here