5
votes

I am getting an intermittent HTTP error when I try to load the contents of files in Azure Databricks from ADLS Gen2. The storage account has been mounted using a service principal associated with Databricks and has been given Storage Blob Data Contributor access through RBAC on the data lake storage account. A sample statement to load is

df = spark.read.format("orc").load("dbfs:/mnt/{storageaccount}/{filesystem}/{filename}")

The error message I get is:

Py4JJavaError: An error occurred while calling o214.load. : java.io.IOException: GET https://{storageaccount}.dfs.core.windows.net/{filesystem}/{filename}?timeout=90 StatusCode=412 StatusDescription=The condition specified using HTTP conditional header(s) is not met.
ErrorCode=ConditionNotMet ErrorMessage=The condition specified using HTTP conditional header(s) is not met.
RequestId:51fbfff7-d01f-002b-49aa-4c89d5000000
Time:2019-08-06T22:55:14.5585584Z

This error is not with all the files in the filesystem. I can load most of the files. The error is just with some of the files. Not sure what the issue is here. Any help will be appreciated.

3
Please raise a support case with Microsoft via the portal. We saw this yesterday as well and are doing the same. Clearly something changed/broke. Out of interest which location are you, we are north Europe.simon_dmorias
We're in north Europe and facing the same problem.. will raise a support ticket.fathomson
We have raised it with Microsoft. No real progress at the moment other than the confirmation that it is not a permission issue. Will post the resolution when we get one from Microsoft. Our instance is in Australia East.Amit Sukralia

3 Answers

1
votes

This has been resolved now. The underlying issue was due to a change at Microsoft end. This is the RCA I got from Microsoft Support:

There was a storage configuration that is turned on incorrectly during the latest storage tenant upgrade. This type of error would only show up for the namespace enabled account on the latest upgraded tenant. The mitigation for this issue is to turn off the configuration on the specific tenant, and we had kicked off the super sonic configuration rollout for the all the tenants. We have since added additional Storage upgrade validation for ADLS Gen 2 to help cover this type of scenario.

0
votes

I had the same problem on one file today. Downloading the file, deleting it from storage and putting it back solved the problem. Tried to rename file -> didn't work.

Edit: we have it on more files, random. We worked around the problem by copying the entire folder to a new folder and rename it to original. Jobs run without problems again.

Still the question remains, why did the files end up in this situation?

0
votes

Same issue here. After some research, it seems it was probably an If-Match eTag condition failure in the http GET request. Microsoft talk about how they will return error 412 when this happens in this post: https://azure.microsoft.com/de-de/blog/managing-concurrency-in-microsoft-azure-storage-2/

Regardless, Databricks seem to have resolved the issue on their end now.