0
votes

Below is the hypothetical schema.

Table : With billions of entries
Column: info 
Column: id_1
Column: data_1
Column: Uid
Column: time_interval

What I am trying to query?

I need to get a list of Uid from Table which match some criteria like

  • the time_interval is just 5 minutes
  • the info contains "info"
  • the id_ is equal to "Guid1"
  • the data_1 has this pattern "Something"

Also, Multiple requests can span for each uid in Table like

  Uid    time_interval   ...other_colums....
=====================
  uid1   00:00:12  
  uid1   00:00:13
  uid1   00:00:15

Once I get the list of Uid and store it as scalar Say it is [uid1, uid2, uid3]. Then, I need to query Table again and compare each of the values in the list of scalars to find the difference between the maximum and minimum time for each uid Say for uid1 example above : the time difference would have: (00:00:15 - 00:00:12) milliseconds.

So my result for [uid1, uid2, uid3] should be:

Uid      time_diff
====     =========
uid1      2 ms
uid2      3 ms
uid3      4 ms

I have the following query below for this, but the subquery which uses scalar just takes last value from the scalar defined "data". Please suggest how can I make the subquery iterate over the results in "data"

What I tried so far?

Also I cannot combine these into one because the data is such that after putting where clauses I just get one record and there is no time interval as such to calculate the difference. Thats why I need to first get all the ids and then query for time interval.

let data = toscalar(Table
| where time_interval between (ago(5m)..now())
| where info has "info"
| where id_1 == "Guid1"
| where data_1 matches regex "Something"
| project pack("uid", Uid));

Table
| where time_interval between (ago(5m)..now())
| where Uid in (data["uid"])
//| summarize dt1 = max(time_interval) - min(time_interval) by Uid
| project Uid, Rid, time_interval

2

2 Answers

2
votes

Please check if the below is what you're looking for. Note that the in() operator is limited to 1 million values, so if you have more distinct ids after the filter, you will need to use inner join between the 2 data sets (shuffle join might be helpful if cardinality is very high).

let T = datatable(time_interval:datetime, id_1:string, info:string, data_1:string) 
[
    datetime(2021-05-21 10:53:04), "Guid1", "no match", "this has something in it",
    datetime(2021-05-21 10:53:06), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:09), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:04), "Guid2", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:08), "Guid2", "this has info in it", "this has something in it"
];
let ids = T
| where time_interval between (datetime(2021-05-21 10:50)..5m)
| where info has "info"
| where id_1 == "Guid1"
| where data_1 matches regex "something"
| distinct id_1;
T 
| where id_1 in (ids)
| summarize min(time_interval), max(time_interval) by id_1

enter image description here

1
votes

Seems like simply combining it into one query would work:

let Table = datatable(time_interval:datetime, id_1:string, info:string, data_1:string) 
    [datetime(2021-05-21 10:53:04), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:06), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:09), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:04), "Guid2", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:08), "Guid2", "this has info in it", "this has something in it"];
Table
| where time_interval between (ago(10d) .. now())
| where info has "info"
| where data_1 has "Something"
| summarize time_diff = max(time_interval) - min(time_interval) by id_1
| project id_1, time_diff

For example: