While improving the performance of a Kusto query, I came across the shuffle strategy for join/summarize. I can clearly see the performance benefits of this strategy for my query, which has high cardinality for the join/summarize key.
While reading the shuffle query Kusto documentation, it seemed that the strategy will be ignored when there are nested shuffle operators.
When the shuffled operator has other shuffle-able operators, like summarize or join, the query becomes more complex and then hint.strategy=shuffle won't be applied.
My query uses nested summarize and join (with shuffle) but I also clearly see performance gains. My query pattern:
Table1
| summarize hint.strategy=shuffle arg_max(Timestamp) by Device, Interface
| join hint.strategy=shuffle (Table2) on Device, Interface
Does a query like this benefit from shuffling?
Also, does the Kusto query planner avoid any problematic shuffle if present always? Basically I wanted to rest assured that there might only be perf issues with a wrongly used/authored shuffle and not data issues.