1
votes

I am using Azure Data Lake Store for storing simple JSON files with the following JSON:

{
  "email": "[email protected]",
  "id": "823956724385"
}

The json files name is myJson1.json. The Azure Data Lake Store is mounted successfully to Azure Databricks.

I am able to load successfully the JSON file via

df = spark.read.option("multiline", "true").json(fi.path)

fi.path is a FileInfo Object which is the MyJson1.json file from above.

When i do

spark.read.option("multiline", "true").json(fi.path)
df.show()` 

i get the JSON object printed out correctly (DataFrame) as

+---------------------+------------+
|                email|          id|
+---------------------+------------+
|[email protected]|823956724385|
+---------------------+------------+

What i want to do is, to load the JSON file with json.load(filename), to be able to work with the JSON object within Python.

When i do

with open('adl://.../myJson1.json', 'r') as file:
  jsonObject0 = json.load(file)

then i get the following error

[Errno 2] No such file or directory 'adl://.../myJson1.json'

When i try (the mount point is correct, i can list the file and also with spark.read into a DataFrame)

    jsonObject = json.load("/mnt/adls/data/myJson1.json")

then i get the following error

'str' object has no attribute 'read'

I have no idea what to do else to get the JSON loaded. My goal is to read the JSON object and iterate through the keys and their values.

1

1 Answers

6
votes

The trick was to use the following syntax for the file url

/dbfs/mnt/adls/data/myJson1.json

i had to add /dbfs/... respectively replace dbfs:/ with /dbfs/ at the beginning of the url.

Then i could use

    with open('/dbfs/mnt/adls/ingress/marketo/update/leads/leads-json1.json', 'r') as f:
      data = f.read()

    jsonObject = json.loads(data)

Maybe it possible easier? But this works for now.