0
votes

I have a situation where I'm handling some data ingestion code, where the environment is set up as such: azure blob storage (not adls) is being used as temporary storage, and the code is supposed to create an external table from the files in the temporary storage, where from the data will be copied to azure synapse analytics. Now, blob storage and sql server are in a virtual network. The existing code did not account for vnets. So the statements to create the external data source has been modified to

CREATE EXTERNAL DATA SOURCE mydatasrc

WITH ( TYPE = HADOOP, LOCATION = 'abfss://[email protected]',

CREDENTIAL = myMsiCred);

from the previous wasbs uri, and to use managed service identity.

The external table statement is:

CREATE EXTERNAL TABLE myschema.myExternalTable (

<columns list>

) WITH (

LOCATION = '/path/to/folder',

DATA_SOURCE = mydatasrc,

FILE_FORMAT = myFormat,

REJECT_TYPE = VALUE,

REJECT_VALUE = 0 );

At first I was getting a 'Please enable Managed service identity on this server', which took a long while to fix, finally using a azure powershell command. Now it's a

Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message: HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD

I thought at first I somehow need to set up authentication for the sql server instance so it can access blob storage inside the vnet. Now I read somewhere that abfss is only used by adls gen2, and blob storage exclusively uses wasbs. Is that true?

1

1 Answers

1
votes

Yes, it is true that abfss is the URI scheme used for ADLS Gen2 while wasbs is the URI scheme used for Blob Storage. They are different since they have slightly different semantics when handling folders I think.

Also, you probably need to make sure that you have the right permissions when accessing.