1
votes

I want to copy data from Azure Table Storage to Azure SQL Server using Azure Data Factory, but I get a strange error.

In my Azure Table Storage I have a column which contains multiple data types (this is how Table Storage works) E.G. Date time and String.

In my Data Factory project I mentioned that the entire column is string, but for some reason the Data Factory assumes the data type based on the first cell that it encounters during the extraction process.

In my Azure SQL Server database all columns are string.

Example

I have this table in Azure Table Storage: Flights

RowKey   PartitionKey   ArrivalTime
--------------------------------------------------
1332-2   2213dcsa-213   04/11/2017 04:53:21.707 PM   - this cell is DateTime

1332-2   2213dcsa-214   DateTime.Null                - this cell is String

If my table is like the one below, the copy process will work, because the first row is string and it will convert the entire column to string.

RowKey   PartitionKey   ArrivalTime
--------------------------------------------------
1332-2   2213dcsa-214   DateTime.Null                - this cell is String

1332-2   2213dcsa-213   04/11/2017 04:53:21.707 PM   - this cell is DateTime

Note: I am not allowed to change the data type in Azure Table Storage, move the rows or to add new ones.

Below are the input and output data sets from Azure Data Factory:

       "datasets": [
        {
            "name": "InputDataset",
            "properties": {
                "structure": [
                    {
                        "name": "PartitionKey",
                        "type": "String"
                    },
                    {
                        "name": "RowKey",
                        "type": "String"
                    },
                    {
                        "name": "ArrivalTime",
                        "type": "String"
                    }
                ],
                "published": false,
                "type": "AzureTable",
                "linkedServiceName": "Source-AzureTable",
                "typeProperties": {
                    "tableName": "flights"
                },
                "availability": {
                    "frequency": "Day",
                    "interval": 1
                },
                "external": true,
                "policy": {}
            }
        },
        {
            "name": "OutputDataset",
            "properties": {
                "structure": [
                    {
                        "name": "PartitionKey",
                        "type": "String"
                    },
                    {
                        "name": "RowKey",
                        "type": "String"
                    },
                    {
                        "name": "ArrivalTime",
                        "type": "String"
                    }
                ],
                "published": false,
                "type": "AzureSqlTable",
                "linkedServiceName": "Destination-SQLAzure",
                "typeProperties": {
                    "tableName": "[dbo].[flights]"
                },
                "availability": {
                    "frequency": "Day",
                    "interval": 1
                },
                "external": false,
                "policy": {}
            }
        }
    ]

Does anyone knows a solution to this issue?

1
Could you please provide the JSON for the table input dataset? ThanksPaul Andrew
Hi Paul, I've added the input and output data sets from Data Factory.Johnny

1 Answers

0
votes

I've just been playing around with this. I think you have 2 options to deal with this.

Option 1

Simply remove the data type attribute from your input dataset. In the 'structure' block of the input JSON table dataset you don't have to specify the type attribute. Remove or comment it out.

For example:

{
"name": "InputDataset-ghm",
"properties": {
    "structure": [
        {
            "name": "PartitionKey",
            "type": "String"
        },
        {
            "name": "RowKey",
            "type": "String"
        },
        {
            "name": "ArrivalTime"
            /* "type": "String"  --<<<<<<  Optional! */
        },

This should mean the data type is not validated on read.

Option 2

Use a custom activity upstream of the SQL DB table load to cleanse and transform the table data. This will mean breaking out the C# and require a lot more dev time. But you may want to reuse the cleaning code for other datasets.

Hope this helps.