I am unable to load a CSV file directly from Azure Blob Storage into a RDD by using PySpark in a Jupyter Notebook.
I have read through just about all of the other answers to similar problems but I haven't found specific instructions for what I am trying to do. I know I could also load the data into the Notebook by using Pandas, but then I would need to convert the Panda DF into an RDD afterwards.
My ideal solution would look something like this, but this specific code give me the error that it can't infer a schema for CSV.
#Load Data
source = <Blob SAS URL>
elog = spark.read.format("csv").option("inferSchema", "true").option("url",source).load()
I have also taken a look at this answer: reading a csv file from azure blob storage with PySpark but I am having trouble defining the correct path.
Thank you very much for your help!