1
votes

I'm working with azure databricks and blob storage. I have a storage account that stores data from IOT devices for every hour. so the folder structure is {year/month/day/hour} it stores data as csv files. My requirement is, need to access the files from azure databricks daily basis (so there will be 24 folders starting from 0-23) and need to perform some calculations.

1
See if this link helps?Joy Wang-MSFT
Which language are using scala, python ??Thomas
I'm using python.Alex

1 Answers

0
votes

In order to process many files under a wasb container you'll need to use the Hadoop Input Format glob patterns. The patterns are as follow, somewhat similar to regex:

* (match 0 or more character)
? (match single character)
[ab] (character class)
[^ab] (negated character class)
[a-b] (character range)
{a,b} (alternation)
\c (escape character)

For your use case, the following should work:

df = spark.read.format("csv").load("/container/*/*/*/*.csv")