0
votes

I've set up a cluster with databricks runtime version 5.1 (includes Apache Spark 2.4.0, Scala 2.11) and Python 3. I also installed hadoop azure library (hadoop-azure-3.2.0) to the cluster.

I'm trying to read a blob stored in my blob storage account which is just a text file containing some numeric data delimited by spaces for example. I used the template generated by databricks for reading blob data

    spark.conf.set(
      "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
      storage_account_access_key)
    df = spark.read.format(file_type).option("inferSchema", "true").load(file_location)

where file_location is my blob file (https://xxxxxxxxxx.blob.core.windows.net).

I get the following error:

No filesystem named https

I tried using sc.textFile(file_location) to read in an rdd and get the same error.

enter image description here

3

3 Answers

2
votes

Your file_location should be in the format:

"wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>"

See: https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html

0
votes
0
votes

These three lines of code worked for me:

spark.conf.set("fs.azure.account.key.STORAGE_ACCOUNT.blob.core.windows.net","BIG_KEY")

df = spark.read.csv("wasbs://CONTAINER@STORAGE_ACCOUNT.blob.core.windows.net/")

df.select('*').show()

NOTE that line 2 ends with .net/ because I do not have a sub-folder.