How to import and process all files from a blob storage container to azure databricks

Question

I'm working with azure databricks and blob storage. I have a storage account that stores data from IOT devices for every hour. so the folder structure is {year/month/day/hour} it stores data as csv files. My requirement is, need to access the files from azure databricks daily basis (so there will be 24 folders starting from 0-23) and need to perform some calculations.

Joe Widen Joe Widen · Accepted Answer · 2018-10-18T04:05:05

In order to process many files under a wasb container you'll need to use the Hadoop Input Format glob patterns. The patterns are as follow, somewhat similar to regex:

* (match 0 or more character)
? (match single character)
[ab] (character class)
[^ab] (negated character class)
[a-b] (character range)
{a,b} (alternation)
\c (escape character)

For your use case, the following should work:

df = spark.read.format("csv").load("/container/*/*/*/*.csv")

How to import and process all files from a blob storage container to azure databricks

1 Answers