Azure data factory - intermittent 400 errors when writing to blob storage

Question

I'm using data factory with blob storage.

I sometime get the below error intermittently - this can occur on different pipelines/data-sources. However I always get the same error, regardless of which task fails - 400 The specified block list is invalid.

Copy activity encountered a user error at Sink side: ErrorCode=UserErrorBlobUploadFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload blob 'https://blob.core.windows.net/', detailed message: The remote server returned an error: (400) Bad Request.,Source=,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid. Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage

This seems to be most common if there is more than one task running at a time that is writing data to the storage. Is there anything I can do to make this process more reliable? Is it possible something has been misconfigured? It's causing slices to fail in data factory, so I'd really love to know what I should be investigating.

A sample pipeline that has suffered from this issue:

{
  "$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
  "name": "Pipeline",
  "properties": {
    "description": "Pipeline to copy Processed CSV from Data Lake to blob storage",
    "activities": [
      {
        "type": "Copy",
        "typeProperties": {
          "source": {
            "type": "AzureDataLakeStoreSource"
          },
          "sink": {
            "type": "BlobSink",
            "writeBatchSize": 0,
            "writeBatchTimeout": "00:00:00"
          }
        },
        "inputs": [ { "name": "DataLake" } ],
        "outputs": [ { "name": "Blob" } ],
        "policy": {
          "concurrency": 10,
          "executionPriorityOrder": "OldestFirst",
          "retry": 0,
          "timeout": "01:00:00"
        },
        "scheduler": {
          "frequency": "Hour",
          "interval": 1
        },
        "name": "CopyActivity"
      }
    ],
    "start": "2016-02-28",
    "end": "2016-02-29",
    "isPaused": false,
    "pipelineMode": "Scheduled"
  }
}

I'm only using LRS standard storage, but I still wouldn't expect it to intermittently throw errors.

EDIT: adding linked service json

{
  "$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.LinkedService.json",
  "name": "Ls-Staging-Storage",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=;AccountKey="
    }
  }
}

@yonisha done. I'm not sure it will be much help, as it looks fairly simple. — Neil P

Yingqin Yingqin · Accepted Answer · 2017-04-24T02:37:39

Such error is mostly caused by racing issues. E.g. multiple concurrent activity runs write to the same blob file.

Could you further check your pipelines settings whether it is the case? And please avoid such setting if so.

Azure data factory - intermittent 400 errors when writing to blob storage

1 Answers