I'm trying to do delta load from Azure Devops to Azure Kusto Cluster using ADF. As Kusto doesn't have an option to update or delete specific records, I have followed the below steps to implement incremental data load.
- Data will be loaded as a full load into main table (Table A) for the first time.
- I will get the Max modified date from Table A and load only the latest modified data into TableStg in Kusto with the following filter.(>=(maxmodifieddate-1 day)
- I will create a Temp table (TableTemp) here in Kusto to merge modified/new data (from TableStg) and exisiting data into single table and then replace the main table (Table A) with Temp table using below KQL.
.set-or-replace TableTemp with (distributed=true) <| set notruncation;let updated=TableStg|union TableA |summarize maxdate=max(ChangedDate) by WorkItemId;let mergeupdate=(TableStg|union TableTemp)|distinct *|join kind=inner updated on $left.WorkItemId==$right.WorkItemId|whereChangedDate==maxdate;mergeupdate| project-away WorkItemId1, maxdate
.drop table TableA ifexists ;
.rename tables TableA=TableTemp;
With large number of records, this query is failing with below memory error.
"error": { "code": "LimitsExceeded", "message": "Request is invalid and cannot be executed.", "@type": "Kusto.DataNode.Exceptions.KustoQueryRunawayException", "@message": "HResult: 0x80DA0001, source: (Partial query failure: Low memory condition (E_LOW_MEMORY_CONDITION). (message: 'bad allocation', details: ''))"
Is there any option to optimize this query and achieve delta load?
Do we have any other way to implement incremental load in Azure Kusto?