1
votes

I receive an error from time and time due to incompatible data in my source data set compared to my target data set. I would like to control the action that the pipeline determines based on error types, maybe output or drop those particulate rows, yet completing everything else. Is that possible? Furthermore, is it possible to get a hold of the actual failing line(s) from Data Factory without accessing and searching in the actual source data set in some simple way?

Copy activity encountered a user error at Sink side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'Timestamp' contains an invalid value '11667'. Cannot convert '11667' to type 'DateTimeOffset'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'.

Thanks

1

1 Answers

3
votes

I think you've hit a fairly common problem and limitation within ADF. Although the datasets you define with your JSON allow ADF to understand the structure of the data, that is all, just the structure, the orchestration tool can't do anything to transform or manipulate the data as part of the activity processing.

To answer your question directly, it's certainly possible. But you need to break out the C# and use ADF's extensibility functionality to deal with your bad rows before passing it to the final destination.

I suggest you expand your data factory to include a custom activity where you can build some lower level cleaning processes to divert the bad rows as described.

This is an approach we often take as not all data is perfect (I wish) and ETL or ELT doesn't work. I prefer the acronym ECLT. Where the 'C' stands for clean. Or cleanse, prepare etc. This certainly applies to ADF because this service doesn't have its own compute or SSIS style data flow engine.

So...

In terms of how to do this. First I recommend you check out this blog post on creating ADF custom activities. Link:

https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/

Then within your C# class inherited from IDotNetActivity do something like the below.

    public IDictionary<string, string> Execute(
        IEnumerable<LinkedService> linkedServices,
        IEnumerable<Dataset> datasets,
        Activity activity,
        IActivityLogger logger)
    {

    //etc

    using (StreamReader vReader = new StreamReader(YourSource))
        {
            using (StreamWriter vWriter = new StreamWriter(YourDestination))
            {
                while (!vReader.EndOfStream)
                {
                //data transform logic, if bad row etc
                }
            }
        }
  }

You get the idea. Build your own SSIS data flow!

Then write out your clean row as an output dataset, which can be the input for your next ADF activity. Either with multiple pipelines, or as chained activities within a single pipeline.

This is the only way you will get ADF to deal with your bad data in the current service offerings.

Hope this helps