1
votes

I have an azure blob with txt file. Some columns have empty values, so when they are saved into database table, they are NULL. I can make it work with direct SQL and SSIS ETL packages.

Example of row:
1002,100,Butter,whipped with salt BUTTER,WHIPPED W SALT,Y,0,6.38,,,

Last three are suppose to be null.

When I try with ADF I get this error:

Copy activity encountered a user error: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'CarbohydratesFactor' contains an invalid value ' '. Cannot convert ' ' to type 'Decimal'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'.

FoodDescriptionsAzureBlob:

{
    "name": "FoodDescriptionsAzureBlob",
    "properties": {
        "structure": [
            {
                "name": "NutrientDatabankNumber",
                "type": "Int32"
            },
            {
                "name": "FoodGroupCode",
                "type": "Int32"
            },
            {
                "name": "LongDescription",
                "type": "String"
            },
            {
                "name": "ShortDescription",
                "type": "String"
            },
            {
                "name": "CommonName",
                "type": "String"
            },
            {
                "name": "ManufacturerName",
                "type": "String"
            },
            {
                "name": "Survey",
                "type": "String"
            },
            {
                "name": "ReferenceDescription",
                "type": "String"
            },
            {
                "name": "RefusePercentage",
                "type": "Int32"
            },
            {
                "name": "ScientificName",
                "type": "String"
            },
            {
                "name": "NitrogenFactor",
                "type": "Decimal"
            },
            {
                "name": "ProteinFactor",
                "type": "Decimal"
            },
            {
                "name": "FatFactor",
                "type": "Decimal"
            },
            {
                "name": "CarbohydratesFactor",
                "type": "Decimal"
            }
        ],
        "published": false,
        "type": "AzureBlob",
        "linkedServiceName": "AzureStorageLinkedService",
        "typeProperties": {
            "fileName": "FOOD_DES.txt",
            "folderPath": "gym-nutrition-data/NutrientData/",
            "format": {
                "type": "TextFormat",
                "rowDelimiter": "\n",
                "columnDelimiter": "^",
                "nullValue": "",
                "quoteChar": "~"
            }
        },
        "availability": {
            "frequency": "Minute",
            "interval": 15
        },
        "external": true,
        "policy": {}
    }
}

FoodDescriptionsSQLAzure:

{
    "name": "FoodDescriptionsSQLAzure",
    "properties": {
        "structure": [
            {
                "name": "NutrientDatabankNumber",
                "type": "Int32"
            },
            {
                "name": "FoodGroupCode",
                "type": "Int32"
            },
            {
                "name": "LongDescription",
                "type": "String"
            },
            {
                "name": "ShortDescription",
                "type": "String"
            },
            {
                "name": "CommonName",
                "type": "String"
            },
            {
                "name": "ManufacturerName",
                "type": "String"
            },
            {
                "name": "Survey",
                "type": "String"
            },
            {
                "name": "ReferenceDescription",
                "type": "String"
            },
            {
                "name": "RefusePercentage",
                "type": "Int32"
            },
            {
                "name": "ScientificName",
                "type": "String"
            },
            {
                "name": "NitrogenFactor",
                "type": "Decimal"
            },
            {
                "name": "ProteinFactor",
                "type": "Decimal"
            },
            {
                "name": "FatFactor",
                "type": "Decimal"
            },
            {
                "name": "CarbohydratesFactor",
                "type": "Decimal"
            }
        ],
        "published": false,
        "type": "AzureSqlTable",
        "linkedServiceName": "AzureSqlLinkedService",
        "typeProperties": {
            "tableName": "FoodDescriptions"
        },
        "availability": {
            "frequency": "Minute",
            "interval": 15
        }
    }
}

Pipeline:

{
    "name": "NutrientDataBlobToAzureSqlPipeline",
    "properties": {
        "description": "Copy nutrient data from Azure BLOB to Azure SQL",
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource",
                        "treatEmptyAsNull": true
                    },
                    "sink": {
                        "type": "SqlSink",
                        "writeBatchSize": 10000,
                        "writeBatchTimeout": "60.00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "FoodGroupDescriptionsAzureBlob"
                    }
                ],
                "outputs": [
                    {
                        "name": "FoodGroupDescriptionsSQLAzure"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst"
                },
                "scheduler": {
                    "frequency": "Minute",
                    "interval": 15
                },
                "name": "FoodGroupDescriptions",
                "description": "#1 Bulk Import FoodGroupDescriptions"
            },
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource",
                        "treatEmptyAsNull": true
                    },
                    "sink": {
                        "type": "SqlSink",
                        "writeBatchSize": 10000,
                        "writeBatchTimeout": "60.00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "FoodDescriptionsAzureBlob"
                    }
                ],
                "outputs": [
                    {
                        "name": "FoodDescriptionsSQLAzure"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst"
                },
                "scheduler": {
                    "frequency": "Minute",
                    "interval": 15
                },
                "name": "FoodDescriptions",
                "description": "#2 Bulk Import FoodDescriptions"
            }
        ],
        "start": "2015-07-14T00:00:00Z",
        "end": "2015-07-14T00:00:00Z",
        "isPaused": false,
        "hubName": "gymappdatafactory_hub",
        "pipelineMode": "Scheduled"
    }
}

I tried to set up "treatEmptyAsNull": true in pipeline, with no luck.

2
Try removing "nullValue": "", from your format node. - JustLogic

2 Answers

3
votes

I had to remove "rowDelimiter": "\n",from blob dataset.

2
votes

Mostly the source text file has row delimiter "\r\n". With the row delimiter setting "\n", the last column will have data value "\r" which is not an empty string and won't be treated as null value. Without the row delimiter setting, ADF Copy would by default use row delimiter "\r\n", and the last column will have empty string and can be treated as null.