How to accommodate Azure SQL query that runs infrequently but is very resource intensive

Question

NOTE: I give details of my Azure setup here, but I'm not sure the solution will be an Azure-based one. This may be a problem that can be resolved at the C#, Entity Framework, or SQL level.

I have a .NET web application running on Azure App Service using Entity Framework to access an Azure SQL DB at pricing level Standard S1 (20 DTU). 99% of the time, the app is utilizing less than 1% of DTU on the SQL DB. However, when someone logs into the Admin Portal of the app and runs a particular report, it executes a query that is very resource intensive and takes a very long time - over a minute - which we can't live with. This report is run only a few times a week. I've tried scaling the SQL DB up and have found - unsurprisingly - that at higher plans, the execution time gets to a somewhat reasonable level. At Standard S4 (200 DTU), the execution time drops to 20 seconds, which is not ideal but I can live with for now. However, it doesn't make sense to pay for S4-level when 99% of the time it will be using only a fraction of a percent of DTU. Any ideas on how I can either reduce the query execution time or only scale when needed?

The Entity Framework code used for this report is:

class MyAppModelContainer : DbContext 
{
    public virtual ObjectResult<GetOrganizationList_Result> GetOrganizationList()
    {
        return ((IObjectContextAdapter)this).ObjectContext.ExecuteFunction<GetOrganizationList_Result>("GetOrganizationList");
    }
}

The model used to retrieve the results is:

public partial class GetOrganizationList_Result
{
    public int id { get; set; }
    public string Name { get; set; }
    public Nullable<int> DeviceCounts { get; set; }
    public Nullable<int> EmailCounts { get; set; }
}

The stored procedure is:

CREATE PROCEDURE [dbo].[GetOrganizationList]
AS
BEGIN
    SELECT o.Id,o.Name,COUNT(distinct s.DeviceId) as DeviceCounts, COUNT(distinct d.userid) as EmailCounts
    FROM Sessions s
    INNER JOIN Devices d on d.Id = s.DeviceId
    RIGHT OUTER JOIN Organizations o on o.id=s.OrganizationId
    GROUP BY o.Id,Name
END

The approximate number of rows in each of the joined tables: Sessions table: 2 million rows Devices table: 166,000 rows Users table: 88,000 rows

Here are the table definitions and indexes:

CREATE TABLE [dbo].[Sessions] (
    [Id]             INT      IDENTITY (1, 1) NOT NULL,
    [DeviceId]       INT      NULL,
    [StartTime]      DATETIME NOT NULL,
    [OrganizationId] INT      NOT NULL,
    CONSTRAINT [PK_Sessions] PRIMARY KEY CLUSTERED ([Id] ASC),
    CONSTRAINT [FK_DeviceSession] FOREIGN KEY ([DeviceId]) REFERENCES [dbo].[Devices] ([Id]),
    CONSTRAINT [FK_OrganizationSession] FOREIGN KEY ([OrganizationId]) REFERENCES [dbo].[Organizations] ([Id])
);

CREATE NONCLUSTERED INDEX [IX_FK_DeviceSession]
    ON [dbo].[Sessions]([DeviceId] ASC);

CREATE NONCLUSTERED INDEX [IX_FK_OrganizationSession]
    ON [dbo].[Sessions]([OrganizationId] ASC);

CREATE NONCLUSTERED INDEX [IX_Sessions_OrganizationId_Include_DeviceId]
    ON [dbo].[Sessions]([OrganizationId] ASC)
    INCLUDE([DeviceId]);    

CREATE NONCLUSTERED INDEX [IX_Sessions_OrganizationId_DeviceId] ON [dbo].[Sessions]
(
    [DeviceId] ASC,
    [OrganizationId] ASC,
    [StartTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CREATE TABLE [dbo].[Devices] (
    [Id]         INT        IDENTITY (1, 1) NOT NULL,
    [UserId]     INT        NULL,
    [MACAddress] NCHAR (12) NOT NULL,
    CONSTRAINT [PK_Devices] PRIMARY KEY CLUSTERED ([Id] ASC),
    CONSTRAINT [FK_UserDevice] FOREIGN KEY ([UserId]) REFERENCES [dbo].[Users] ([Id]),
    CONSTRAINT [IX_Unique_MACAddress] UNIQUE NONCLUSTERED ([MACAddress] ASC)
);

CREATE NONCLUSTERED INDEX [IX_FK_UserDevice]
    ON [dbo].[Devices]([UserId] ASC);

CREATE TABLE [dbo].[Users] (
    [Id]    INT            IDENTITY (1, 1) NOT NULL,
    [Email] NVARCHAR (250) NOT NULL,
    [Sex]   TINYINT        NOT NULL,
    [Age]   SMALLINT       NOT NULL,
    [PhoneNumber] NCHAR (10)     NOT NULL DEFAULT '' ,
    [Name] NVARCHAR(100) NOT NULL DEFAULT '', 
    CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC),
    CONSTRAINT [IX_Unique_Email_PhoneNumber] UNIQUE NONCLUSTERED ([Email] ASC, [PhoneNumber] ASC)
);

I rebuild indexes and update statistics on a weekly basis. Azure SQL DB has no performance recommendations.

Any ideas on how to solve this problem without simply throwing more Azure hardware at it? I'm open to anything including Azure-level changes, SQL changes, code changes. It doesn't appear that there's a consumption model of pricing for Azure SQL DB, which may help me if it existed.

If you don't need up to the nanosecond information, you could create a summary table and update it at an appropriate interval. — Dan Bracuk
Can't you create a (background) process that renders the report at certain intervals so it is already available upon request? — Peter Bons

M.Ali M.Ali · Accepted Answer · 2018-04-26T11:58:17

I would suggest creating following indexes or add the missing columns into your exiting Indexes.

CREATE NONCLUSTERED INDEX [NIX_Session_Device_OrganizationId]
ON [dbo].[Sessions] ([DeviceId] , [OrganizationId]);


CREATE NONCLUSTERED INDEX [NIX_Device_ID_UserID]
ON [dbo].[Devices] ([Id], [userid]);


CREATE NONCLUSTERED INDEX [NIX_Organizations]
ON [dbo].[Organizations] ([Id] , [Name]);

200 DTUs isn't a big number, 2oo DTUs mean you are already on S4 service level, anything above will put you in S6.

First try to tune your query with appropriate indexes, once that done then start looking at DTUs, and really for a mission critical system I would prefer to go with vCore pricing model rather than juggling with the blackbox of DTUs.

How to accommodate Azure SQL query that runs infrequently but is very resource intensive

2 Answers