2
votes

We have JSON's that contain timestamps in the format:

2016-11-03T03:05:21.673Z

2016-11-03T03:05:21.63Z

So the appropriate format to parse the data is yyyy-MM-ddTHH:mm:ss.FFF\Z

I tried all of these variants to explain to ADF how to parse it:

"structure": [
  {
    "name": "data_event_time",
    "type": "DateTime",
    "format": "yyyy-MM-ddTHH:mm:ss.FFF\\Z"
  },
  ...
]

"structure": [
  {
    "name": "data_event_time",
    "type": "DateTimeOffset",
    "format": "yyyy-MM-ddTHH:mm:ss.FFFZ"
  },
  ...
]

"structure": [
  {
    "name": "data_event_time",
    "type": "DateTimeOffset"
  },
  ...
]

"structure": [
  {
    "name": "data_event_time",
    "type": "DateTime"
  },
  ...
]

In all of these cases above ADF fails with the error:

Copy activity encountered a user error at Sink side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'data_event_time' contains an invalid value '2016-11-13T00:44:50.573Z'. Cannot convert '2016-11-13T00:44:50.573Z' to type 'DateTimeOffset' with format 'yyyy-MM-dd HH:mm:ss.fffffff zzz'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'.

What am i doing wrong? How to fix it?

2

2 Answers

1
votes

The previous issue has been fixed. Thanx wBob.

But now i have a new issue at the sink level.

I'm trying to load data from Azure Blob Storage to Azure DWH via ADF + PolyBase:

      "sink": {
        "type": "SqlDWSink",
        "sqlWriterCleanupScript": "$$Text.Format('DELETE FROM [stage].[events] WHERE data_event_time >= \\'{0:yyyy-MM-dd HH:mm}\\' AND data_event_time < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)",
        "writeBatchSize": 6000000,
        "writeBatchTimeout": "00:15:00",
        "allowPolyBase": true,
        "polyBaseSettings": {
          "rejectType": "percentage",
          "rejectValue": 10.0,
          "rejectSampleValue": 100,
          "useTypeDefault": true
        }
      },
      "enableStaging": true,
      "stagingSettings": {
        "linkedServiceName": "AppInsight-Stage-BlobStorage-LinkedService"
      },
      "translator": {
        "type": "TabularTranslator",
        "columnMappings": "..."
      }

But the process fails with error:

Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=107091;Query aborted-- the maximum reject threshold (10 %) was reached while reading from an external source: 6602 rows rejected out of total 6602 rows processed. Rows were rejected while reading from external source(s). 52168 rows rejected from external table [ADFCopyGeneratedExternalTable_0530887f-f870-4624-af46-249a39472bf3] in plan step 2 of query execution: Location: '/13/2cd1d10f-4f62-4983-a38d-685fc25c40a2_20161102_135850.blob' Column ordinal: 0, Expected data type: DATETIMEOFFSET(7) NOT NULL, Offending value: 2016-11-02T13:56:19.317Z (Column Conversion Error), Error: Conversion failed when converting the NVARCHAR value '2016-11-02T13:56:19.317Z' to data type DATETIMEOFFSET. Location: '/13/2cd1d10f-4f62-4983-a38d-685fc25c40a2_20161102_135850.blob' Column ordinal: 0, Expected ...

I read the Azure SQL Data Warehouse loading patterns and strategies

If the DATE_FORMAT argument isn’t designated, the following default formats are used:

DateTime: ‘yyyy-MM-dd HH:mm:ss’

SmallDateTime: ‘yyyy-MM-dd HH:mm’

Date: ‘yyyy-MM-dd’

DateTime2: ‘yyyy-MM-dd HH:mm:ss’

DateTimeOffset: ‘yyyy-MM-dd HH:mm:ss’

Time: ‘HH:mm:ss’

Looks like i have no ability at ADF level to specify the datetime format for PolyBase.

Does someone know any workaround?

0
votes

We looked at a similar issue recently here:

What's reformatting my input data before I get to it?

JSON does not have a Datetime format as such, so leave the type and format elements out. Then your challenge is with the sync. Inserting these values into an Azure SQL Database for example should work.

"structure": [
  {
    "name": "data_event_time"
  },
  ...

Looking at your error message, I would expect that to work inserting into a DATETIME column in SQL Data Warehouse (or SQL Database or SQL Server on a VM) but it is ordinary DATETIME data, not DATETIMEOFFSET.

If you have issues inserting into the target sink, you may have to workaround by not using the Polybase checkbox and code that side of the process yourself, eg

  1. Copy raw files to blob storage or Azure Data Lake (now Polybase supports ADLS)
  2. Create external tables over the files where the datetime data is set as varchar data-type
  3. CTAS the data into an internal table, also converting the string datetime format to a proper DATETIME using T-SQL