0
votes

I've trying to identify which query is causing my workload to stall, according to the metrics (Metrics (preview) tab in Azure Portal) I see: 100% DTU utilization, caused by the CPU enter image description here

But when I go to QDS I see a different picture: enter image description here

And the reported queries by QDS in this period don't take that as long as the DTU cap is being hit.

enter image description here

I know that the 1 minute reported by the metrics view is the correct one, since the operation from the user side takes that long and I can see in the Web App telemetry the app not responding in this time period.

So how can I identify the query that hits the DTU limit?

P.S. The db is an S0.

UPDATE @Alberto Morillo, I've executed the query, it there are a lot of cheap queries ran (~2k) - the largest values for total_worker_time are in the 54k (54 ms). On the other hand I see the wait stats is dominated by SOS_WORK_DISPATCHER.

Does this mean that the queries are blocking because the workers can't be spawned by the scheduler that fast?

1

1 Answers

0
votes

Please run the following query:

SELECT TOP 10 q.query_id, p.plan_id, 
rs.count_executions, 
qsqt.query_sql_text,
CONVERT(NUMERIC(10,2), (rs.avg_cpu_time/1000)) as 'avg_cpu_time_seconds',
CONVERT(NUMERIC(10,2),(rs.avg_duration/1000)) as 'avg_duration_seconds',
CONVERT(NUMERIC(10,2),rs.avg_logical_io_reads ) as 'avg_logical_io_reads',
CONVERT(NUMERIC(10,2),rs.avg_logical_io_writes ) as 'avg_logical_io_writes',
CONVERT(NUMERIC(10,2),rs.avg_physical_io_reads ) as 'avg_physical_io_reads',
CONVERT(NUMERIC(10,0),rs.avg_rowcount ) as 'avg_rowcount'
from sys.query_store_query q  
JOIN sys.query_store_plan p ON q.query_id = p.query_id 
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_query_text qsqt
  ON q.query_text_id = qsqt.query_text_id
WHERE rs.last_execution_time > dateadd(hour, -1, getutcdate())
ORDER BY rs.avg_duration DESC

Change the ORDER BY clause to avg_cpu_time and avg_rowcount also.