3
votes

In Azure data factory we have Collections like:

intersection([1, 2, 3], [101, 2, 1, 10],[6, 8, 1, 2])

whic returns [1, 2]

I'm looking for opposite to intersection. For given two arrays, it should return elements which are not present in first array.

Ex: If we pass two arrays, 
    Array a: [1,2,3,4,5]
    Array b: [1,2,3]

Ouput: [4,5]

One way is to add two lookup activites resulting two arrays and Foreach of item in first array, if not present in second item array execute another activity.

Is there any better way to do in Azure Data factory?

Thanks

2

2 Answers

2
votes

intersection works with arrays. If you have two arrays (Array being a native Azure Data Factory type) then I don't think Lookup will work as it works with datasets? It's a shame Azure Data Factory (ADF v2) does not have the complement to union and intersection in terms of set operations which might be complement or except or minus.

However you can work around this using a for each loop and an if condition. For example, I have an ADF v2 pipeline with two array parameters like this:

ADF pipeline pic 1

The for each loop loops through the a1 parameter. Set the Items property to this:

@pipeline().parameters.a1

Within the for each loop, add an if condition:

ADF pipeline pic 2

Set the if condition to use the contains function to check the a2 parameter for the present of the current item in the for each loop ( ie item() ):

@contains(pipeline().parameters.a2,item())

Finally add an 'if false' activity, with an Append variable task, eg

ADF pipeline pic 3

a3 is a variable of type Array.

ADF pipeline pic 4

No need to spin up a Spark cluster to solve this particular problem just yet ; )

0
votes

Did you try solving this with a Join transformation in ADF Mapping Data Flows?