1
votes

I'm trying to connect from Azure SQLDW via Polybase to a Parquet file in Data Lake Gen 2. This is my code:

CREATE DATABASE SCOPED CREDENTIAL DSC_ServicePrincipal
    WITH IDENTITY = '1234567890@https://login.microsoftonline.com/1234567890/oauth2/token',
    SECRET = '1234567890'
GO

CREATE EXTERNAL DATA SOURCE [DS_ADLS] WITH (TYPE = HADOOP, 
        LOCATION = N'abfss://[email protected]', 
        CREDENTIAL = DSC_ServicePrincipal)
GO

CREATE EXTERNAL FILE FORMAT [ParquetFileFormatSnappy] 
WITH (FORMAT_TYPE = PARQUET, DATA_COMPRESSION = N'org.apache.hadoop.io.compress.SnappyCodec')
GO

CREATE EXTERNAL TABLE [dbo].[DimDate]
  (
        [DateSKey]        int   not null,
      [Date]              date  not null,
      [Year]              int   not null,
      [Month]             int   not null,
      [Day]             int not null,
      [WeekOfYear]      int not null,
      [MonthNameShort]  varchar(50) not null,
      [MonthName]         varchar(50)   not null,
      [DayNameShort]      varchar(50)   not null,
      [DayName]         varchar(50) not null
  )
  WITH (DATA_SOURCE = [DS_ADLS],LOCATION = N'/PRESENTED/dimDate',FILE_FORMAT = [ParquetFileFormatSnappy],REJECT_TYPE = VALUE,REJECT_VALUE = 0)

The create external table fails to execute and the following error is returned:

Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message: HdfsBridge::isDirExist - Unexpected error encountered checking whether directoy exists or not: AbfsRestOperationException: HEAD https://xxxx.dfs.core.windows.net/xxxx?resource=filesystem&timeout=90 StatusCode=403 StatusDescription=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. ErrorCode= ErrorMessage=

The directory does exist and my service principal has access. I have confirmed this by using the same service principal from Databricks and reading the file without error.

I'm lost as to what I've done wrong.

2

2 Answers

1
votes

The issue is that user-supplied oauth credentials are not currently supported for abfs/abfss.

The supported authentication methods are:

  • Storage account key
  • MSI-based authentication
0
votes

Btw - Oauth2.0 is supported now on Gen2 . If you've verified the path and permissions are correct, i.e. via Azure Databricks, I would suspect its network related. Try using a storage account with unrestricted network settings to confirm.

You can get this error as a result of the storage FW being enabled - looking at docs here you can see the asterisks denoting the FW restrictions using various permission types. For copy-into you can refer to this authorisation table.

Usually you can "allow trusted microsoft services" in the storage network, however that only works when using MSI - in your case you're using an SPN.

If you still want to query your datalake using service principal or user identity (as opposed to the MSI) you can try following the steps here via powershell that will allow the Synapse dedicated pool (formerly SQLDW) to pass the storage FW, then switch to SPN or UPN to authorise against POSIX ACLs.

Finally, here is a good blog post that discusses the different options.