0
votes

I'm trying to do delta load from Azure Devops to Azure Kusto Cluster using ADF. As Kusto doesn't have an option to update or delete specific records, I have followed the below steps to implement incremental data load.

  1. Data will be loaded as a full load into main table (Table A) for the first time.
  2. I will get the Max modified date from Table A and load only the latest modified data into TableStg in Kusto with the following filter.(>=(maxmodifieddate-1 day)
  3. I will create a Temp table (TableTemp) here in Kusto to merge modified/new data (from TableStg) and exisiting data into single table and then replace the main table (Table A) with Temp table using below KQL.

.set-or-replace TableTemp with (distributed=true) <| set notruncation;let updated=TableStg|union TableA |summarize maxdate=max(ChangedDate) by WorkItemId;let mergeupdate=(TableStg|union TableTemp)|distinct *|join kind=inner updated on $left.WorkItemId==$right.WorkItemId|whereChangedDate==maxdate;mergeupdate| project-away WorkItemId1, maxdate

.drop table TableA ifexists ;

.rename tables TableA=TableTemp;

With large number of records, this query is failing with below memory error.

"error": { "code": "LimitsExceeded", "message": "Request is invalid and cannot be executed.", "@type": "Kusto.DataNode.Exceptions.KustoQueryRunawayException", "@message": "HResult: 0x80DA0001, source: (Partial query failure: Low memory condition (E_LOW_MEMORY_CONDITION). (message: 'bad allocation', details: ''))"

Is there any option to optimize this query and achieve delta load?

Do we have any other way to implement incremental load in Azure Kusto?

1

1 Answers

1
votes

You have a couple of more (simpler) options:

  1. You may be able to use update policy, if you guarantee that each ADF ingestion happens after the previous one completed.
  2. You can also use materialized-views to apply the "last updated" logic.