How to optimize SQL Server Columnstore Alignment

Question

I have a Clustered Columnstore Index Table for our IOT metrics (timeseries data). It contains more than 1 billion rows and structured like this:

CREATE TABLE [dbo].[Data](
[DeviceId] [bigint] NOT NULL,
[MetricId] [smallint] NOT NULL,
[TimeStamp] [datetime2](2) NOT NULL,
[Value] [real] NOT NULL
)

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([TimeStamp],[DeviceId],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

There are some 10,000 distinct DeviceId values and TimeStamps range from 2008 till now. A typical query against this table looks like this:

SET STATISTICS TIME, IO ON
SELECT
    [DeviceId]
    ,[MetricId]
    ,DATEADD(hh, DATEDIFF(day, '2005-01-01', [TimeStamp]), '2005-01-01') As [Date]
    ,MIN([Value]) as [Min]
    ,MAX([Value]) as [Max]
    ,AVG([Value]) as [Avg]
    ,SUM([Value]) as [Sum]
    ,COUNT([Value]) as [Count]
FROM
    [dbo].[Data]
WHERE
    [DeviceId] = 6077129891325167032
    AND [MetricId] = 1000
    AND [TimeStamp] BETWEEN '2017-07-01' AND '2017-07-30'
GROUP BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])
ORDER BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])

When I execute this query, I get this for performance metrics:

Because at the moment a query like stated above does too many Segment reads I believe:

Table 'Data'. Scan count 2, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 5257, lob physical reads 9, lob read-ahead reads 4000.
Table 'Data'. Segment reads 11, segment skipped 764.

Query plan:

This is not well optimized I believe as there were 11 segments read to retrieve only 212 out of 1 billion source rows (before grouping/aggregation)

So then I ran Niko Neugebauer's great scripts to validate our setup and the Columnstore Alignment https://github.com/NikoNeugebauer/CISL/blob/master/Azure/alignment.sql, I get this result after rebuilding the Columnstore Clustered Index:

MetricId and TimeStamp columns have optimal alignment score of 100%. How can we ensure that the DeviceId column is also well aligned? I played with the column order in the initial Clustered (Rowstore) index, is that where things can be optimized?

if you are using sql server 2016,try to use dbcc clonedb and share the db,so that others can repro the exact scenario you are facing.if you are not using 2016, you can script our table schema,indexes,stats and try sharing the script — TheGameiswar
@TheGameiswar DBCC CLONEDATABASE is available from SQL Server 2014 SP2 onwards : ) — wBob

Niko Neugebuer Niko Neugebuer · Accepted Answer · 2017-08-07T10:24:02

the key solution to align your table by DeviceId is to build a clustered rowstore index on your table and then build over it a clustered Columnstore Index with MAXDOP = 1 (in order not to introduce any overlaps which take place when index build runs with multiple cores). So the possible code will look something like this:

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([DeviceId],[TimeStamp],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

Another possibility would be to do it all within CISL, by preparing and then executing the alignment funcitonality:

insert into dbo.cstore_Clustering( TableName, Partition, ColumnName )
    VALUES ('[dbo].[Data]', 1, 'DeviceId' );

This is though just for 1 partition, but you should consider partitioning your table anyway, once you get into the numbers you are using. After setting up, you can start executing dbo.cstore_doAlignment, which will automatically re-align and optimise your table. (You will have some parameters to configure the threshold of the optimisations, if you like)

Best regards, Niko

How to optimize SQL Server Columnstore Alignment

2 Answers