3
votes

I have a Data Lake Store account. I have a directory full of files containing data in JSON format, including some string values that contain times in ISO 8601 format, to wit: { "reading_time": "2008-09-15T15:53:00.91077" }

Now when I create a Pipeline with a Data Factory that uses these JSON files as an input dataset, it sees the value of reading_time in a typical US format: "9/15/2008 3:53:00 PM". Specifically, I get this message when I try to populate a DateTime field in the output dataset:

Column 'reports.reading_time' contains an invalid value '9/15/2008 3:53:00 PM'. Cannot convert '9/15/2008 3:53:00 PM' to type 'DateTime'

I thought hey, what if I tell my input dataset to specifically expect an ISO input date? So I tried this in my pipeline specification:

"datasets": [
  {
    "name": "ImprovedInputDataset",
      "properties": {
        "structure": [
          {
            "name": "reports.reading_time",
            "type": "Datetime",
            "format": "ISO"
          }
        ]
      }
    }
  }
]

I was pretty impressed with myself for getting a slightly different error message (see "with format 'ISO'" at end):

Column 'reports.reading_time' contains an invalid value '9/15/2008 3:53:00 PM'. Cannot convert '9/15/2008 3:53:00 PM' to type 'DateTime' with format 'ISO'

Long story short, it seems as though something is noticing the ISO date format in my original input and doing me the dubious "favor" of converting it to a US-style date string before my pipeline gets to see it. I can't find anything in the Azure documentation online that explains exactly what happens to my input dataset before my Pipeline spec executes though.

I would appreciate if someone would either a) explain to me what it is that's converting my ISO date/time string to a US type date/time string and how to correct it; or b) point me to the documentation on the "preprocessing" that must be happening inside the Data Factory before my Pipeline spec is run.

2
Didn't think Azure Data Factory supported JSON natively yet? Could be wrong as it seems to be trying something. I would probably use U-SQL to do this and just use ADF to orchestrate / call the U-SQL script. Some samples here.wBob
Azure Data Lake Store is supported as a "source" and as a "sink." docs.microsoft.com/en-us/azure/data-factory/… And the wizard gives you options for accepting an Array of Objects in JSON or a Set of Objects in JSON. So it seems that JSON is supposed to work, and Factory tries to make it work... it just gets caught up on that date format.catfood
wBob, if you write this up as an answer it will get the bounty.catfood
I have also encountered issues with ADFv2 formatting datetime strings using a specific locale when I haven't asked it to do anything. My issue is specifically with passing a datetime string (lastModified output from Get Metadata activity) as a parameter to a stored procedure invoked during a Copy activity into a SQL sink. The stored procedure accepts a varchar(256) for that parameter, but ADFv2 formats the string into a specific datetime locale first. This doesn't happen when invoking a stored procedure directly. I plan to file a bug report.bdforbes
I should note that my workaround is to not store the lastModified property as a string in SQL. I thought I was playing it safe by just storing the raw string in SQL as returned by the Get Metadata activity, as I thought there might be issues converting the string to the SQL datetime type. However, it actually seems to work perfectly, so I just pass the datetime string through, with the stored procedure expecting it as datetime.bdforbes

2 Answers

1
votes

I can reproduce this issue but got it to work using "String" datatype for input data set. You can also not specify a datatype, eg

{
    "name": "InputDataset-9ad",
    "properties": {
        "structure": [
            {
                "name": "reading_time"
            }
        ],
...

This is in line with my current thinking that JSON does not have a datetime datatype as such. Documentation suggests format would be a .net format, "ISO" will never work. I spent some time trying to debug many different date formats, eg "yyyy-MM-ddTHH:mm:ss.fffffff" but non of them work either. My guess is either datetime is simply not supported for JSON or it's buggy / has an issue with the "T" and basically ignores the format and defaults to something, what looks like "en-US" in your example.

I did find that most date formats "just work" without specifying structure. If you did having something not international or not portable, eg "01/04/2017" (is it the 1st April or the 4th Jan?) then the workaround would be to import it to a staging table as string and transform from there.

I do have a question outstanding with an internal newsgroup and I'll update this post if I receive any further information. NB I do not work for Microsoft.

HTH

1
votes

Microsoft docs has the following link: type mapping teach what to do in those cases

Basically one could specify the data format of the exported json the format string is the same used by C# language.