This one has me stumped. I have a dimension table with around 30 million rows in it. It is a clustered columnstore. Also, this table has a primary key constraint of type INT, on its surrogate key.
A query to retrieve the MIN() of the surrogate key, for a given date range, looks like this:
SELECT
MIN(DIM.OrderId)
FROM
dbo.Dim_Order AS DIM
WHERE
DIM.OrderDate >= CAST('2016-06-01' AS DATE)
AND DIM.OrderDate < CAST('2016-07-01' AS DATE)
OPTION (MAXDOP 1);
Here is the output:
Table 'Dim_Order'. Scan count 2, logical reads 833, physical reads 0, read-ahead reads 0, lob logical reads 1702561, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim_Order'. Segment reads 304001, segment skipped 0.
(1 row affected)
SQL Server Execution Times: CPU time = 2829 ms, elapsed time = 2876 ms.
Instead of using the columnstore, the optimizer is choosing to use the non-clustered primary key and perform Key Lookups through a Nested Loop. To make matters worse, it severely underestimates the number of rows returned.
Strangely, the row estimate seems to be inversely proportional to the size of the date range.
╔════════════╦══════════════════════════╗
║ Date Range ║ Estimated Number of Rows ║
╠════════════╬══════════════════════════╣
║ 1 year ║ 2.00311 ║
║ 6 months ║ 3.41584 ║
║ 1 month ║ 24.4459 ║
║ 2 weeks ║ 52.093 ║
║ 1 week ║ 99.9055 ║
║ 3 days ║ 217.632 ║
║ 1 day ║ 1088.16 ║
╚════════════╩══════════════════════════╝
This version, with an INDEX hint, runs almost instantly:
SELECT
MIN(DIM.OrderId)
FROM
dbo.Dim_Order AS DIM WITH(INDEX=CCI_Dim_Order)
WHERE
DIM.OrderDate >= CAST('2016-06-01' AS DATE)
AND DIM.OrderDate < CAST('2016-07-01' AS DATE)
OPTION (MAXDOP 1);
Table 'Dim_Order'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 1004, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim_Order'. Segment reads 2, segment skipped 0.
(1 row affected)
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
I have observed this behavior on the following versions:
Microsoft SQL Server 2016 (RTM) - 13.0.1601.5 (X64)
Microsoft SQL Server 2016 (SP1-CU5) (KB4040714) - 13.0.4451.0 (X64)
The repro script below will create an example table and populate it with 2 years of orders, for 2,000 customers, one order per day. This works out to 1,462,000 sample orders in our table, spanning 24 months, with each month having approximately 60,000 rows. The sample queries at the bottom of the script are intended to demonstrate the behavior. As you will see, for some reason, the row estimate is very low, and the optimizer refuses to use the clustered columnstore unless prompted.
I appreciate any input or advice on this. Here is the sample script.
DROP TABLE IF EXISTS dbo.Dim_Order
CREATE TABLE dbo.Dim_Order
(
OrderId INT NOT NULL
, CustomerId INT NOT NULL
, OrderDate DATE NOT NULL
, OrderTotal decimal(5,2) NOT NULL
);
WITH CTE_DATE AS
(
SELECT CAST('2016-01-01' AS DATE) AS DateValue
UNION ALL
SELECT
DATEADD(DAY, 1, D.DateValue)
FROM
CTE_DATE AS D
WHERE
D.DateValue < CAST('2017-12-31' AS DATE)
),
CTE_CUSTOMER AS
(
SELECT 1 AS CustomerId
UNION ALL
SELECT
CustomerId + 1
FROM
CTE_CUSTOMER AS D
WHERE
D.CustomerId < 2000
)
, CTE_FINAL
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY DateValue ASC, CustomerId ASC) AS OrderId
, CustomerId
, DateValue AS OrderDate
, CAST(ROUND(RAND(CHECKSUM(NEWID()))*(100-1)+1, 2) AS DECIMAL(5,2)) AS OrderTotal
FROM
CTE_DATE
CROSS JOIN CTE_CUSTOMER
)
INSERT INTO
dbo.Dim_Order
(
OrderId
, CustomerId
, OrderDate
, OrderTotal
)
SELECT
ORD.OrderId
, ORD.CustomerId
, ORD.OrderDate
, ORD.OrderTotal
FROM
CTE_FINAL AS ORD
OPTION (MAXRECURSION 32767);
CREATE CLUSTERED COLUMNSTORE INDEX CCI_Dim_Order ON dbo.Dim_Order;
ALTER INDEX CCI_Dim_Order ON dbo.Dim_Order
REORGANIZE
WITH (COMPRESS_ALL_ROW_GROUPS = ON)
ALTER TABLE dbo.Dim_Order
ADD CONSTRAINT PK_Dim_Order PRIMARY KEY NONCLUSTERED (OrderId ASC);
RETURN;
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT
MIN(DIM.OrderId)
FROM
dbo.Dim_Order AS DIM
WHERE
DIM.OrderDate = CAST('2016-06-01' AS DATE)
AND DIM.OrderDate < CAST('2016-07-01' AS DATE)
OPTION (MAXDOP 1);
SELECT
MIN(DIM.OrderId)
FROM
dbo.Dim_Order AS DIM WITH(INDEX=CCI_Dim_Order)
WHERE
DIM.OrderDate >= CAST('2016-06-01' AS DATE)
AND DIM.OrderDate < CAST('2016-07-01' AS DATE)
OPTION (MAXDOP 1);