6
votes

I have a Clustered Columnstore Index Table for our IOT metrics (timeseries data). It contains more than 1 billion rows and structured like this:

CREATE TABLE [dbo].[Data](
[DeviceId] [bigint] NOT NULL,
[MetricId] [smallint] NOT NULL,
[TimeStamp] [datetime2](2) NOT NULL,
[Value] [real] NOT NULL
)

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([TimeStamp],[DeviceId],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

There are some 10,000 distinct DeviceId values and TimeStamps range from 2008 till now. A typical query against this table looks like this:

SET STATISTICS TIME, IO ON
SELECT
    [DeviceId]
    ,[MetricId]
    ,DATEADD(hh, DATEDIFF(day, '2005-01-01', [TimeStamp]), '2005-01-01') As [Date]
    ,MIN([Value]) as [Min]
    ,MAX([Value]) as [Max]
    ,AVG([Value]) as [Avg]
    ,SUM([Value]) as [Sum]
    ,COUNT([Value]) as [Count]
FROM
    [dbo].[Data]
WHERE
    [DeviceId] = 6077129891325167032
    AND [MetricId] = 1000
    AND [TimeStamp] BETWEEN '2017-07-01' AND '2017-07-30'
GROUP BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])
ORDER BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])

When I execute this query, I get this for performance metrics:

Because at the moment a query like stated above does too many Segment reads I believe:

Table 'Data'. Scan count 2, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 5257, lob physical reads 9, lob read-ahead reads 4000.
Table 'Data'. Segment reads 11, segment skipped 764.

Query plan: Query plan

This is not well optimized I believe as there were 11 segments read to retrieve only 212 out of 1 billion source rows (before grouping/aggregation)

So then I ran Niko Neugebauer's great scripts to validate our setup and the Columnstore Alignment https://github.com/NikoNeugebauer/CISL/blob/master/Azure/alignment.sql, I get this result after rebuilding the Columnstore Clustered Index:

Columnstore Alignment

MetricId and TimeStamp columns have optimal alignment score of 100%. How can we ensure that the DeviceId column is also well aligned? I played with the column order in the initial Clustered (Rowstore) index, is that where things can be optimized?

2
please paste query plan as xml as well - TheGameiswar
if you are using sql server 2016,try to use dbcc clonedb and share the db,so that others can repro the exact scenario you are facing.if you are not using 2016, you can script our table schema,indexes,stats and try sharing the script - TheGameiswar
@TheGameiswar DBCC CLONEDATABASE is available from SQL Server 2014 SP2 onwards : ) - wBob
@wBob:great ,thanks for the info - TheGameiswar
how about creating the missing index.? - TheGameiswar

2 Answers

9
votes

the key solution to align your table by DeviceId is to build a clustered rowstore index on your table and then build over it a clustered Columnstore Index with MAXDOP = 1 (in order not to introduce any overlaps which take place when index build runs with multiple cores). So the possible code will look something like this:

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([DeviceId],[TimeStamp],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

Another possibility would be to do it all within CISL, by preparing and then executing the alignment funcitonality:

insert into dbo.cstore_Clustering( TableName, Partition, ColumnName )
    VALUES ('[dbo].[Data]', 1, 'DeviceId' );

This is though just for 1 partition, but you should consider partitioning your table anyway, once you get into the numbers you are using. After setting up, you can start executing dbo.cstore_doAlignment, which will automatically re-align and optimise your table. (You will have some parameters to configure the threshold of the optimisations, if you like)

Best regards, Niko

0
votes

When Max dop set as 1 for creating clustered column store will sort your records effectively, but for a table with 1 billion rows this max dop 1 will not help u. It is better to partition the table with any of the date columns and then create a clustered column store index with max dop any thing higher than 1 or 0. But in that case sorting will not be guaranteed but clustered column store index will do the segment elimination effectively. One should clearly noted, do not drop or create clustered column store index by keeping any other non clustered index in your table, this will impact your clustered column store index creation/drop performance. If you drop a clustered column store index by keeping other index SQL server will do lot of works on the other indexes.