I succeeded to solve that using that using row_window_session()
. This is a Window function. You can read more about it at Window functions overview.
The solution is:
let reqs = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.000Z"), "r1", 31 // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.002Z"), "r1", 7, // D1
datetime("2020-12-15T08:00:00.007Z"), "r1", 8, // D2
datetime("2020-12-15T08:00:00.021Z"), "r1", 6, // D3
datetime("2020-12-15T08:00:00.006Z"), "r1", 4, // D5
datetime("2020-12-15T08:00:00.023Z"), "r1", 2, // D4
];
deps
| extend endTime = timestamp + totimespan(duration * 10000)
| sort by timestamp asc
| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
| summarize max(endTime) by operation_ParentId, SessionStarted
| extend diff = max_endTime - SessionStarted
| summarize todouble(sum(diff)) by operation_ParentId
| join reqs on operation_ParentId
| extend diff = duration - sum_diff / 10000
| project diff
The idea here is to sort the entries by the open time, and as long as the next previous end time is later than the current start time, we don't open a new session. Let's explain each line of this query to see how this is being done:
- Calculate the
endTime
based on the duration. To normalize the data I'll multiply by 10000 the duration:
| extend endTime = timestamp + totimespan(duration * 10000)
- Sort by start time:
| sort by timestamp asc
- This is the key of this solution. It is calculated on the
timestamp
column. The next two parameters are limits when to start new buckets. Since we don't want to seal a bucket based on time that have passed, I provided 1 hour which will not hit with this input. The forth argument helps us to create a new session based on the data. As long as there are more rows that will result in timestamp > prev(endTime)
they will have the same start time.
| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
- Now we have multiple rows per session start. So we want to keep only the latest time per session. We also keep
operation_ParentId
to later on join on that key:
| summarize max(endTime) by operation_ParentId, SessionStarted
- Calculate the time of each session:
| extend diff = max_endTime - SessionStarted
- Sum up all session times:
| summarize todouble(sum(diff)) by operation_ParentId
- Join on
req
to get the total starting time:
| join reqs on operation_ParentId
- Calculate the diff between the total time and the session times. Unnormalize the data:
| extend diff = duration - sum_diff / 10000
- Project the final result:
| project diff
You can find this query running at Kusto Samples open database.
Having said that, please note that this is a linear operation. Meaning that if there are 2 following segments, that should be under the same segment, but they do not intersect, it will fail. For example, adding the following into deps
:
datetime("2020-12-15T08:00:00.026Z"), "r1", 1, // D6
which should not add anything to the calculation, cause it to misbehave. This is because d4
is the previous point, and it has no point of contact with d6
, although d3
covers them both.
To solve that, you need to repeat the same logic of steps 3-5. Unfortunately Kusto does not have recursions, therefore you cannot solve this for any kind of input. But assuming there are no really depth such cases that breaks this logic, I think it is good enough.