6
votes

I have a table which looks like this:

id  timestamp  value1  value2
 1  09:12:37     1       1
 1  09:12:42     1       2
 1  09:12:41     1       3
 1  10:52:16     2       4
 1  10:52:18     2       5
 2  09:33:12     3       1
 2  09:33:15     3       2
 2  09:33:13     3       3

I need to group by id and value1. For each group i want to have the row with the highest timestamp.

The result for the table above would look like this:

id  timestamp  value1  value2
 1  09:12:42     1       2
 2  09:33:15     3       2

I know there is the summarize operator which would give me this:

mytable
| project id, timestamp, value1, value2
| summarize max(timestamp) by id, value1

Result:
     id  timestamp  value1
      1  09:12:42     1
      2  09:33:15     3

But i was not able to get value2 for this rows too.

Thanks in advance

2

2 Answers

12
votes

If i understand your question correctly, you should be able to use summarize arg_max():

doc: https://docs.microsoft.com/en-us/azure/kusto/query/arg-max-aggfunction

datatable(id:long, timestamp:datetime, value1:long, value2:long)
[
 1, datetime(2019-03-20 09:12:37), 1, 1,
 1, datetime(2019-03-20 09:12:42), 1, 2,
 1, datetime(2019-03-20 09:12:41), 1, 3,
 1, datetime(2019-03-20 10:52:16), 2, 4,
 1, datetime(2019-03-20 10:52:18), 2, 5, // this has the latest timestamp for id == 1
 2, datetime(2019-03-20 09:33:12), 3, 1,
 2, datetime(2019-03-20 09:33:15), 3, 2, // this has the latest timestamp for id == 2
 2, datetime(2019-03-20 09:33:13), 3, 3,
]
| summarize arg_max(timestamp, *) by id

This will result with:

| id | timestamp                   | value1 | value2 |
|----|-----------------------------|--------|--------|
| 2  | 2019-03-20 09:33:15.0000000 | 3      | 2      |
| 1  | 2019-03-20 10:52:18.0000000 | 2      | 5      |
0
votes

I found a solution to my problem, but there might be a better one.

mytable
| project id, timestamp, value1, value2
| order by timestamp desc
| summarize max(timestamp), makelist(value2) by id, value1

Results in:

 id  timestamp  value1  list_value2
  1  09:12:42     1     ["2", "3", "1"]
  2  09:33:15     3     ["2", "3", "1"]

Now you can extend the query by adding

| project max_timestamp, id, value1, list_value2[0]

to get the first element from that list. Replace '0' by any number between 0 and length(list_value2)-1 to access the other values.

One more advice: The timestamp i use is the one that is generated by ApplicationInsights. In our code we call TrackTrace to log some data. If you order the rows by this timestamp, the resulting list of rows is not garanteed to be in the same order in which the data was produced in code.