1
votes

`

SELECT DISTINCT ECS.UserId,,PFR.EntityId,PFR.CreatedBy
FROM EventConsentStatus ECS 
INNER JOIN @Institutions I ON ECS.InstitutionId=I.InstitutionId             
LEFT JOIN ParentFormResponses PFR ON  PFR.EntityTypeId = 1 
          AND PFR.EntityId=@ActivityId AND ECS.EventId=PFR.EntityId                 
WHERE ECS.EventId = @ActivityId

A table called ParentFormResponses mentioned in the above query is having more than 3lacs records. Table is not having any identity column instead I have created clustered primary index on group of some columns based on uniqueness. But still it is taking more than 16 minutes to execute a simple select statement i.e. select * from ParentFormResponses.

If I am removing column names of ParentFormResponses table from above mentioned select statement then it is showing result in 2 3 seconds but for above query it is taking too much time.

If I create non clustered index on entityid and entitytypeid then also it is not giving optimized result.

Please suggest me how can I improve table structure and query performance.

Detail information: Table Stucture:

`CREATE TABLE [dbo].[ParentFormResponses](
[EntityId] [int] NOT NULL,
[FormId] [int] NOT NULL,
[StudentId] [nvarchar](50) NOT NULL,
[CreatedBy] [nvarchar](50) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[EntityTypeId] [int] NOT NULL,
[FormVersion] [decimal](18, 1) NOT NULL,
[DigitallySigned] [bit] NULL,
[HasResponseChanged] [bit] NULL,

CONSTRAINT [PK_ParentFormResponses] PRIMARY KEY CLUSTERED ( [EntityId] ASC, [FormId] ASC, [StudentId] ASC, [EntityTypeId] ASC, [FormVersion] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY]`

=> I dropped primary key index and created below non clustered index CREATE NONCLUSTERED INDEX [IX_ParentFormResponses_sakshi_EntityId] ON [dbo].[ParentFormResponses_sakshi] ( [EntityId] ASC, [EntityTypeId] ASC, [FormId] ASC, [FormVersion] ASC ) INCLUDE ( [StudentId], [CreatedBy]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO

=> Result of all three tables based on sp_spaceused are-

  • name- rows, reserved, data, index_size, unused
  • ParentFormResponses - 309961, 64704 KB, 63592 KB, 936 KB, 176 KB
  • ParentFormResponses_sakshi- 309893, 117696 KB, 60472 KB, 56944 KB, 280 KB
  • EventConsentStatus - 673796, 380920 KB, 109240 KB, 271512 KB, 168 KB
  • Note: Number of records are not 30k but more than 3 lacs. By mistake I wrote wrongly.

    1
    Do you know what a query plan is? If not - read is up. And post it. That said, start by making the Distinct surplus. That could possibly require a lot of work on the tempdb, but then the query plan will tell you.TomTom
    To select data from such a small table should not take long. It must be the fetching proces taking so long. Are you storing big data (images, videos, etc.)? I assume select count(*) from ParentFormResponses to take some seconds only. Correct? As to the join: an index on EntityId and EntityTypeId as mentioned by you should usually make this fast, at least when there are many different EntityIds.Thorsten Kettner
    Another question: estimated, how many users access table ParentFormResponses when running your query(es)?Eduard Uta
    Post the query plan .. also make sure that non clustered index you have created are actually being used and that statistics are up to date..Deepshikha
    try use group by instead of distinctDudi Konfino

    1 Answers

    1
    votes

    The combination of distinct and left join is suspicious from a performance perspective. First, check the following query:

    SELECT ECS.UserId, ECS.EntityId
    FROM EventConsentStatus ECS INNER JOIN
         @Institutions I
         ON ECS.InstitutionId = I.InstitutionId             
    WHERE ECS.EventId = @ActivityId;
    

    You can optimize this with an index on EventConsentStatus(EventId, UserId). Assuming this has good performance, it will produce the right set of rows. If not, then you may be getting duplicates from @Institions. If this is the case, then consider removing it (no columns are used from that table, it is only for filtering). Or:

    SELECT ECS.UserId, ECS.EntityId
    FROM EventConsentStatus ECS 
    WHERE ECS.EventId = @ActivityId AND
          EXISTS (SELECT 1 FROM  @Institutions I WHERE ECS.InstitutionId = I.InstitutionId)
    

    To get the right values, let's get the additional data using outer apply:

    SELECT ECS.UserId, PFR.EntityId, PFR.CreatedBy
    FROM EventConsentStatus ECS INNER JOIN
         @Institutions I
         ON ECS.InstitutionId = I.InstitutionId OUTER APPLY
         (SELECT TOP 1 PFR.EntityId, PFR.CreatedBy
          FROM ParentFormResponses pfr
          WHERE ECS.EventId = PFR.EntityId  AND
                PFR.EntityTypeId = 1 AND
                PFR.EntityId = @ActivityId
         ) pfr
    WHERE ECS.EventId = @ActivityId;
    

    For this, you want an index on ParentFormResponses(EntityId, EntityTypeID, CreatedBy).