0
votes

For some of our logs we have the following schema:

  1. A "master" event table that collects events. Each event comes with a unique id (guid).
  2. For each event we collect additional IoT data (sensor data) which also contains the guid as a link the event table

Now, we often see the schema that someone starts with the IoT data and then wants to query the master event table. The join or query criteria is the guid. Now, as we have a lot of data the unconditioned query, of course, does not return within a short time frame, if at all.

Now what our analysts do is to use the time range as a factor. Typically, the sensor data refers to events that happend on the same day or +/- a few hours, or minutes or seconds (depends on the events). This query typically returns, but not always as fast as it could be. Given that the guid is unique, queries that explicitely state this knowledge are typically way faster than those that don't, e.g.

Event_Table | where ... | take 1

unfortuntely, everyone needs to remember those properties of the data.

After this long intro: Is there a way in Kusto to speed up those queries without explictely write "take 1"? As in, telling the Kusto engine that this column holds unique keys? I am not talking about enforcing that (as a DB unique key would do), but just to give hints to kusto on how to improve the query? Can this be done somehow?

1

1 Answers

0
votes

It sounds that you can benefit from introducing server-side function: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/stored-functions

Using this method - you can define a function on a server - and users will provide a parameter to the function.