2
votes

In my stored procedure, I need "Unique" values of one of the columns. I am not sure if I should and if I should, what type of Index I should apply on the table for better performance. No being very specific, the same case happens when I retrieve distinct values of multiple columns. The column is of String(NVARCHAR) type.

e.g.

select DISTINCT Column1 FROM Table1;

OR

select DISTINCT Column1, Column2, Column3 FROM Table1;

2
I don't think an index will help these queries because every record needs to be scanned to make sure that a given column or set of columns is distinct.Tim Biegeleisen
Got it. Thanks.Harsh

2 Answers

2
votes

An index on these specific columns could improve performance by a bit, but just because it will require SQL Server to scan less data (just these specific columns, nothing else). Other than that - a SCAN will always be done. An option would be to create indexed view if you need distinct values from that table.

CREATE VIEW Test
WITH SCHEMABINDING
AS
SELECT Column1, COUNT_BIG(*) AS UselessColumn
FROM Table1
GROUP BY Column1;
GO
CREATE UNIQUE CLUSTERED INDEX PK_Test ON Test (Column1);
GO

And then you can query it like that:

SELECT *
FROM Test WITH (NOEXPAND);

NOEXPAND is a hint needed for SQL Server to not expand query in a view and treat it as a table. Note: this is needed for non Enterprise version of SQL Server only.

2
votes

I recently had the same issue and found it could be overcome using a Columnstore index:

    CREATE NONCLUSTERED COLUMNSTORE INDEX [CI_TABLE1_Column1] ON [TABLE1]
    ([Column1])
    WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0)