0
votes

I have decided to try and use Azure Data Factory to replicate data from one SQL Managed Instance Database to another with some trimming of the data in the process.

I have set up two Datasets to each Database / Table imported the schema ok (these are duplicated so identical) created a dataflow with one as the source and updated the schema in the projection, added a simple AlterRow (column != 201) gave it the PK then I add the second dataset as the sink and for some reason in the mapping all the output columns are showing as 'string' but the input columns show correctly.

because of this the mapping fails as it thinks the input and output are not matching? I cant understand why both Schema's in the dataset show correctly and the projection in the dataflow for the source shows correctly but it thinks i am outputting to all string columns?

TIA

1
Can you try importing the schema directly in the projection by using "Import Projection" in your data flow source transform instead of relying on the dataset schema? - Mark Kromer MSFT
Thanks Mark, I am doing both I also Import the Projection in the source in the data flow and that works fine, its the Sink that doesnt have the correct schema but I dont see anywhere on the Sink to set or import the schema? maybe I am missing a step? - Col.Cook
So I removed the Schema from both the datasets and imported the projection in my source and it was still not happy so I had to set it to Allow Schema Drift, which is a little strange as I know I am moving like for like so would prefer that the schema remains validated? - Col.Cook
If you remove the schema, you are, by nature, using schema drift. You can just leave auto-mapping on the sink and ADF will map incoming columns names as the outgoing column names. If you need to change or custom map columns when you have no schema, use rule-based mapping in the sink. - Mark Kromer MSFT

1 Answers

0
votes

Here is an easy way to map a set of unknown incoming fields to a defined database table schema ... Add a Select transformation before your Sink. Paste this into the Script behind for the Select:

select(mapColumn( each(match(true())) ), skipDuplicateMapInputs: true, skipDuplicateMapOutputs: true) ~> automap

Now, in your Sink, just leave schema drift and automapping on.