2
votes

I am planning to use Azure Data factory for creating backup of Azure Tables storage. The entities in my Azure Table could change their schema. Is there a way Azure Pipeline could handle this without a manual intervention everytime schema changes ?

Eg : Let first entry be

  <entry>
    <content type="application/xml">
      <m:properties>
        <d:PartitionKey>P1</d:PartitionKey>
        <d:RowKey>R1</d:RowKey>
        <d:Timestamp m:type="Edm.DateTime">2017-05-22T20:37:34.8743000Z</d:Timestamp>
        <d:IsDefault m:type="Edm.Boolean">False</d:IsDefault>
      </m:properties>
    </content>
  </entry>

While another entry could be :

  <entry>
    <content type="application/xml">
      <m:properties>
        <d:PartitionKey>P2</d:PartitionKey>
        <d:RowKey>R2</d:RowKey>
        <d:Timestamp m:type="Edm.DateTime">2017-05-22T20:37:34.8743000Z</d:Timestamp>
        <d:IsDefault m:type="Edm.Boolean">False</d:IsDefault>
        **<d:IsTest m:type="Edm.Boolean">False</d:IsTest>**
      </m:properties>
    </content>
  </entry>

I don't want to change my Dataset everytime a entity change.

According to doc : https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq

If the structure and jsonPathDefinition are not defined in the Data Factory dataset, the Copy Activity detects the schema from the first object and flatten the whole object.

Is their a workaround to this problem.

1

1 Answers

0
votes

The entities in my Azure Table could change their schema. Is there a way Azure Pipeline could handle this without a manual intervention everytime schema changes ?

In this article, we could find that for schema-free data stores such as Azure Table, the Data Factory service infers the schema in one of the following ways:

1. If you specify the structure of data by using the structure property in the dataset definition, the Data Factory service honors this structure as the schema. In this case, if a row does not contain a value for a column, a null value is provided for it.

2. If you don't specify the structure of data by using the structure property in the dataset definition, Data Factory infers the schema by using the first row in the data. In this case, if the first row does not contain the full schema, some columns are missed in the result of copy operation.

If you do not want to manually and explicitly specify the structure property in the dataset definition, you could store/update the table schema in another table or blob when schema changes, and then you could create custom activity by using .NET SDK and dynamically&programmatically define structure property based on the stored schema when you create datasets.