0
votes

I am reading data from a table which has the following structure and indexes


SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Bets](
    [bwd_BetTicketNr] [bigint] NOT NULL,
    [bwd_LineID] [int] NOT NULL,
    [bwd_ResultID] [bigint] NOT NULL,
    [bwd_LineStake] [bigint] NULL,
    [bwd_CreatedAt] [datetime] NULL,
    [bwd_DateModified] [datetime] NULL,
 CONSTRAINT [PK_BetwayDetails] PRIMARY KEY CLUSTERED 
(
    [bwd_BetTicketNr] ASC,
    [bwd_LineID] ASC,
    [bwd_ResultID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [idx___Bets__bwd_CreatedAt] ON [dbo].[Bets]
(
    [bwd_CreatedAt] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


CREATE NONCLUSTERED INDEX [idx___Bets__bwd_DateModified] ON [dbo].[Bets]
(
    [bwd_DateModified] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX [nccs___Bets] ON [dbo].[Bets]
(
    [bwd_BetTicketNr]
)WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO

I would like to understand the decision of the table developer to use non-clustered columnstore index on the bwd_BetticketNr column and not a classic row-store, as in the date columns.

The production table is ~6 billion rows and unique bwd_Betticketnr values are ~50million

Running queries on test tables up to 50mil rows using row-store and columnstore has similar performance therefore I cannot emulate the scaling. Is columnstore more suitable due to data-type and cardinality?

I tried to find similar questions/posts/blogs making such comparison but I have not found anything yet.

I am using SQL Server 2017.

2
One nice side effect of having this is that you can get batchmode even on queries not using the columnstore index (batch mode on rowstore.is available without this in 2019). But indexes added just for that reason will usually be filteredMartin Smith

2 Answers

2
votes

Columnstore indexes are optimized for compression and scan speed, and should show significantly reduced logical IO and CPU for queries that require an index scan.

A likely target here is:

select count(*) from bets

which would otherwise scan one of the nonclustered indexes.

or

select count(*) from bets where bwd_BetTicketNr = @tn

which would otherwise do a partial range scan of the clustered index.

eg

SET QUOTED_IDENTIFIER ON
GO
--drop table if exists bets
go

CREATE TABLE [dbo].[Bets](
    [bwd_BetTicketNr] [bigint] NOT NULL,
    [bwd_LineID] [int] NOT NULL,
    [bwd_ResultID] [bigint] NOT NULL,
    [bwd_LineStake] [bigint] NULL,
    [bwd_CreatedAt] [datetime] NULL,
    [bwd_DateModified] [datetime] NULL,
 CONSTRAINT [PK_BetwayDetails] PRIMARY KEY CLUSTERED 
(
    [bwd_BetTicketNr] ASC,
    [bwd_LineID] ASC,
    [bwd_ResultID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [idx___Bets__bwd_CreatedAt] ON [dbo].[Bets]
(
    [bwd_CreatedAt] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


CREATE NONCLUSTERED INDEX [idx___Bets__bwd_DateModified] ON [dbo].[Bets]
(
    [bwd_DateModified] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX [nccs___Bets] ON [dbo].[Bets]
(
    [bwd_BetTicketNr]
)WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO

with 
n as 
(
    select top (10*1000*1000) row_number() over (order by (select null)) i
    from sys.objects o, sys.messages m

), q as
(
  select 
    i [bwd_BetTicketNr] 
   ,i [bwd_LineID] 
   ,i [bwd_ResultID] 
   ,i [bwd_LineStake]
   ,getdate() [bwd_CreatedAt]
   ,getdate() [bwd_DateModified]
   from n
)
insert into Bets
select * from q
alter index all on Bets rebuild with (online=off)

set statistics io on
set statistics time on

select count(*) from bets with (index=[idx___Bets__bwd_DateModified])

set statistics io off
set statistics time off
/*
Table 'Bets'. Scan count 1, logical reads 42229, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 563 ms,  elapsed time = 563 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
*/

set statistics io on
set statistics time on

select count(*) from bets with (index=[nccs___Bets])

set statistics io off
set statistics time off
/*
Table 'Bets'. Scan count 2, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 9816, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Bets'. Segment reads 10, segment skipped 0.

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 19 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
*/
1
votes

In general, a columnstore index will be faster than an index scan on a rowstore index -- even when the rowstore is created with ROW or PAGE compression. (NOTE: The degree of parallelism (DOP) for batch mode operations is limited to 2 for SQL Server Standard Edition and 1 for SQL Server Web and Express Editions. This refers to columnstore indexes created over disk-based tables and memory-optimized tables.) I have seen some examples where a columnstore index scan using two threads requires more elapsed time than a rowstore index scan using a larger number of threads (8, for instance). However, the worker time for the columnstore index may actually be less.

Before we ask "Why?", let's see what the database engine has to say...

Is The Index Used?
Index usage stats can tell us if an index is being used for seeks, scans, or lookups. This query will tell you what's happening with the indexes on dbo.Bets. Columnstore indexes are not used for seeks, so pay close attention to user_scans in the results.

USE [your database]
GO

SELECT i.name, ius.*
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats ius
    ON ius.object_id = i.object_id
    AND ius.index_id = i.index_id
    AND ius.database_id = DB_ID()
WHERE i.object_id = object_id('dbo.Bets')

How Is The Index Used?
Assuming the number of index scans for the columnstore index is greater than zero, you can search the plan cache for execution plans that use the index. Try this query, which is from Jonathan Kehayias' article Finding what queries in the plan cache use a specific index:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE @IndexName AS NVARCHAR(128) = 'nccs___Bets';

-- Make sure the name passed is appropriately quoted
IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);
-- Handle the case where the left or right was quoted manually but not the opposite side
IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '['+@IndexName;
IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';

-- Dig into the plan cache and find all plans using this index
;WITH XMLNAMESPACES
    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')   
SELECT
stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQL_Text,
obj.value('(@Database)[1]', 'varchar(128)') AS DatabaseName,
obj.value('(@Schema)[1]', 'varchar(128)') AS SchemaName,
obj.value('(@Table)[1]', 'varchar(128)') AS TableName,
obj.value('(@Index)[1]', 'varchar(128)') AS IndexName,
obj.value('(@IndexKind)[1]', 'varchar(128)') AS IndexKind,
cp.plan_handle,
query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
CROSS APPLY stmt.nodes('.//IndexScan/Object[@Index=sql:variable("@IndexName")]') AS idx(obj)
OPTION(MAXDOP 1, RECOMPILE);

So, back to the original question: Why did the developer decide on those indexes? If the columnstore index is in use, an examination of the related query plans and their underlying queries should be fairly revealing.