CosmosDB sink in data flow does not conform to target dataset schema

Question

I have defined 1 pipeline with 1 data flow.The data flow does the following:

Reads a document from Blob source with av JSON schema
Lookup over a second source from CosmosDB with the same schema. Simple lookup for equality on 1 field.
Merge an array property that comes from the database with the one that comes from the blob
Upsert the resulting document back to the same CosmosDB collection through a corresponding sink.

Even though the document schemas are the same for the different sources and the sink, i have defined 2 different schemas - 1 for the blob and 1 for the CosmosDB, used by both source and sink.

The JSON doc schema itself is nothing complex - few properties under the root and an array property of flat document with few properties, that is getting merged. Some of these properties are ints or doubles, rest are strings. Documents are getting correctly processed with the content in the array correctly merged, and then either udpated or inserted in the CosmosDB collection.

However none of the int fields are written as such - they are all converted to strings. Doubles seem to be handled correctly. The schema is correct throughout the data flow. Tried even with adding explicit transformation in order to set the type to int prior to the sink, yet still the same outcome.

I looked then a bit under the hood and found out that the script that is created behind the scenes contains a sink definition with the wrong field types - instead of ints, the fields are all strings. Then i decided to outsmart the ADF and edited the script manually. After running a Publish though i was proven that ADF is smarter than me. In the publish branch the script was magically reverted back to its original state - strings instead of int for the fields in the sink. While at the same time the Dev branch clearly contains the correctly defined types (though manually).Very annoying indeed!

ADF has taken a long way since v1 and resembles a lot the dev experience for SSIS (and even better), yet lack of control of data types of the fields/columns, at least at the source/sink points seems somewhat childish. And in addition this magic transformation of the types from int to strings during publish (!?!) adds 2 more points in direction south for the time being :(

Any idea if this is a known issue, and moreover, if there is a known workaround will be highly appreciated!

Steve Zhao Steve Zhao · Accepted Answer · 2020-09-29T09:36:04

You can create a DerivedColumn after step 3, and use this expression toInteger(your column).

I think this is a similar question, you can refer to it.

CosmosDB sink in data flow does not conform to target dataset schema

1 Answers