I'm trying to connect from a local Spark job to my ADLS Gen 2 data lake to read some Databricks delta tables, which I've previously stored through a Databricks Notebook, but I'm getting a very weird exception, which I can't sort out:
Exception in thread "main" java.io.IOException: There is no primary group for UGI <xxx> (auth:SIMPLE)
at org.apache.hadoop.security.UserGroupInformation.getPrimaryGroupName(UserGroupInformation.java:1455)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:136)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:108)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:94)
Searching around, I've not found many hints on this. One, which I tried was to pass the config "spark.hadoop.hive.server2.enable.doAs", "false", but it didn't help out.
I'm using io.delta 0.3.0, Spark 2.4.2_2.12 and azure-hadoop 3.2.0. I can connect to my Gen 2 account without issues through an Azure Databricks Cluster/ Notebook.
I'm using code like the folling:
try(final SparkSession spark = SparkSession.builder().appName("DeltaLake").master("local[*]").getOrCreate()) {
//spark.conf().set("spark.hadoop.hive.server2.enable.doAs", "false");
spark.conf().set("fs.azure.account.key.stratify.dfs.core.windows.net", "my gen 2 key");
spark.read().format("delta").load("abfss://[email protected]/Test");
}