0
votes

Over the past week or two, we've seen a four cases where our Azure SQL Database DTU graph ends up looking like this:

enter image description here

That is, it seems to "restart" (note that the graph consistently shows 0 DTUs before the spike, which was definitely not the case because we have constant traffic on this server). This seems to indicate that the DTU measurements restarted. The large spike, followed by the subsequent decaying and stabilizing DTU values seems to indicate to us that the database is "warming up" (presumably doing things like populating caches and organizing indexes perhaps?). The traffic to the web app that accesses this database showed nothing abnormal over the same time period, so we don't have any reason to think that this is a result of "high load".

The "Activity Log" item in Azure doesn't show any information. Looking at the "Resource Health" of our database, however, we saw the following:

enter image description here

Note the A problem with your SQL database has been resolved. The timestamp however doesn't exactly correspond to the time of the spike above (the graph is showing UTC+1 time, and presumably the resource-health timestamp is in UTC, so it's about 1.15hrs difference).

Clicking on "View History" gives us all such events for the past couple of weeks:

enter image description here

In each case the database is "available" again within the refresh-granularity (2 minutes), again suggesting restarts. Interestingly, the restarts are around 4 days apart in each case.

Of course I expect and understand that the database be moved around and restarted from time to time. Our web app is Asp.Net Core 2.0 and uses connection resiliency, so we don't have any failing requests.

That said, considering that this has been happening relatively frequently in the last few weeks, I'm of course wondering if this is something that needs action from our side. We did, for example, upgrade to Entity Framework Core 2.0 around 5 weeks ago, so I'm slightly concerned that that might have something to do with it.

My questions:

  • Is there any way to know for sure that the database server restarted? Is this information stored in the database itself anywhere, or perhaps on the master database?

  • Is there any way to know the reason for such restarts, and whether or not it's "our fault" or simply a result of hosting-environment changes? Does the Azure team make such information publicly available anywhere?

The database is on S3 Standard level (100 DTUs) and is hosted in South-East Asia. It's around 3.5GB in size.

1
Best way would be to raise a ticket with support and demand what happenedTheGameiswar

1 Answers

1
votes

Please enable Query Store to identify queries and statements involved on those spikes you see on the DTU consumption graph.

ALTER DATABASE [DB1] SET QUERY_STORE = ON;

Then use a query like below to identify long running queries and the tables involved with them. The name of the tables may give you an idea on what is creating those spikes.

SELECT TOP 10 rs.avg_duration, qt.query_sql_text, q.query_id,  
    qt.query_text_id, p.plan_id, GETUTCDATE() AS CurrentUTCTime,   
    rs.last_execution_time   
FROM sys.query_store_query_text AS qt   
JOIN sys.query_store_query AS q   
    ON qt.query_text_id = q.query_text_id   
JOIN sys.query_store_plan AS p   
    ON q.query_id = p.query_id   
JOIN sys.query_store_runtime_stats AS rs   
    ON p.plan_id = rs.plan_id  
WHERE rs.last_execution_time > DATEADD(hour, -1, GETUTCDATE())  
ORDER BY rs.avg_duration DESC;

About the downtimes logged on Resource Health, it seems they are related to maintenance tasks because they occur every 4 days, but I will report it to SQL Azure team and try to get a feedback.