Azure databricks - not able to read .csv files using spark jobs from datalake storage gen2 service

Question

I have a databricks clister running fine. And using the following code I can mount my "datalake storage gen2" account as well. I am mounting everything on /mnt/data1

val configs =  Map("fs.azure.account.auth.type" -> "OAuth",
           "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id" -> appID,
           "fs.azure.account.oauth2.client.secret" -> password,
           "fs.azure.account.oauth2.client.endpoint" -> ("https://login.microsoftonline.com/" + tenantID + "/oauth2/token"),
           "fs.azure.createRemoteFileSystemDuringInitialization"-> "true")
    
    dbutils.fs.mount(
    source = "abfss://" + fileSystemName + "@" + storageAccountName + ".dfs.core.windows.net/",
    mountPoint = "/mnt/data1",
    extraConfigs = configs)

Until this point everything is fine and working. But when I'm trying to access one file from the mount location with the following command

val df = spark.read.csv("/mnt/data1/creodemocontainer/movies.csv")

I'm getting following error

java.io.FileNotFoundException: dbfs:/mnt/data1/creodemocontainer2/movies.csv
    at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$getFileStatus$2(DatabricksFileSystemV2.scala:775)

Though I can connect and load those files in PowerBI without any issue. I'm not getting any clue from last 2 days So any help will be really appreciated.

thanks in advance.

Could you please use dbutils.fs.ls to check if the file exist? — Jim Xu
Here is the error............. java.io.FileNotFoundException: dbfs:/mnt/data1/creodemocontainer/movies.csv at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$getFileStatus$2(DatabricksFileSystemV2.scala:775) — Rohit
thanks for the clue. I'm not supposed to add container name while reading. Everything is working fine now. — Rohit
Since the issue has been resolved, could you please post your answer? — Jim Xu

CHEEKATLAPRADEEP-MSFT CHEEKATLAPRADEEP-MSFT · Accepted Answer · 2020-11-06T06:00:57

Sharing the answer as per the comment by the original poster:

I'm not supposed to add container name while reading.

val df = spark.read.csv("/mnt/data1/creodemocontainer/movies.csv")

Removed container name, since it has already called in the mount point. Everything is working fine now

val df = spark.read.csv("/mnt/data1/movies.csv")

Azure databricks - not able to read .csv files using spark jobs from datalake storage gen2 service

1 Answers