Cassandra query 2nd index with pagination become slower when data grow

Question

When I query secondary index with pagination, query becomes slower when data grows.
I thought with pagination, no matter how large your data grow, it takes same time to query one page. Is that true? Why my query get slower?

My simplified table is

CREATE TABLE closed_executions (
  domain_id            uuid,
  workflow_id          text,
  start_time           timestamp,
  workflow_type_name   text,
  PRIMARY KEY  ((domain_id), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC)
  AND COMPACTION = {
    'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  }
  AND GC_GRACE_SECONDS = 172800;

And I create a secondary index as

CREATE INDEX closed_by_type ON closed_executions (workflow_type_name);

I query with following CQL

SELECT workflow_id, start_time, workflow_type_name 
FROM closed_executions 
WHERE domain_id = ? 
AND start_time >= ? 
AND start_time <= ? 
AND workflow_type_name = ?

and code

query := v.session.Query(templateGetClosedWorkflowExecutionsByType,
        request.DomainUUID,
        common.UnixNanoToCQLTimestamp(request.EarliestStartTime),
        common.UnixNanoToCQLTimestamp(request.LatestStartTime),
        request.WorkflowTypeName).Consistency(gocql.One)
iter := query.PageSize(request.PageSize).PageState(request.NextPageToken).Iter()
// PageSize is 10, but could be thousand

Environement:

MacBook Pro
Cassandra: 3.11.0
GoCql: github.com/gocql/gocql master

Observation:
10K rows, within second
100K rows, ~3 second
1M rows, ~17 second

Debug log:

INFO  [ScheduledTasks:1] 2018-09-11 16:29:48,349 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2018-09-11 16:29:48,357 MonitoringTask.java:173 - 1 operations were slow in the last 5005 msecs:
<SELECT * FROM cadence_visibility.closed_executions WHERE workflow_type_name = code.uber.internal/devexp/cadence-bench/load/basic.stressWorkflowExecute AND token(domain_id, domain_partition) >= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND token(domain_id, domain_partition) <= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND start_time >= 2018-09-11 16:29-0700 AND start_time <= 1969-12-31 16:00-0800 LIMIT 10>, time 2747 msec - slow timeout 500 msec
DEBUG [COMMIT-LOG-ALLOCATOR] 2018-09-11 16:31:47,774 AbstractCommitLogSegmentManager.java:107 - No segments in reserve; creating a fresh one
DEBUG [ScheduledTasks:1] 2018-09-11 16:40:22,922 ColumnFamilyStore.java:899 - Enqueuing flush of size_estimates: 23.997MiB (2%) on-heap, 0.000KiB (0%) off-heap

Related ref (no answer for my questions):

-- Edit tablestats returns

Total number of tables: 105

----------------
Keyspace : cadence_visibility
    Read Count: 19
    Read Latency: 0.5125263157894736 ms.
    Write Count: 3220964
    Write Latency: 0.04900822269357869 ms.
    Pending Flushes: 0
        Table: closed_executions
        SSTable count: 1
        SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
        Space used (live): 20.3 MiB
        Space used (total): 20.3 MiB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 6.35 KiB
        SSTable Compression Ratio: 0.40192660515179696
        Number of keys (estimate): 3
        Memtable cell count: 28667
        Memtable data size: 7.35 MiB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 9
        Local read count: 9
        Local read latency: NaN ms
        Local write count: 327024
        Local write latency: NaN ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 16 bytes
        Bloom filter off heap memory used: 8 bytes
        Index summary off heap memory used: 38 bytes
        Compression metadata off heap memory used: 6.3 KiB
        Compacted partition minimum bytes: 150
        Compacted partition maximum bytes: 62479625
        Compacted partition mean bytes: 31239902
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): 0
        Dropped Mutations: 0 bytes

----------------

Can you include cfstats of your index? Most likely your too low cardinality with workflow_type_name. Really you should avoid secondary indexes all together unless theres a strong need for consistency that requires it that cant be met without it. Scatter gather queries are never going to scale well. — Chris Lohfink
Thanks @ChrisLohfink for reply, not sure if tablestats is what you mentioned as cfstats, but I pasted it at end of my original post. My current understanding about my query is, Cassandra load all data from partition by WHERE domain_id = ? AND start_time >= ? AND start_time <= ? , say 100k rows; then filtered by local index workflow_type_name = ? say 50k rows; then pagination return pagesize 10 rows from 50k rows. Is my understanding true? If so, why pagination didn't start earlier, so that filtered by local index step just return 10 rows? — vancexu

vancexu vancexu · Accepted Answer · 2018-09-19T19:03:48

Why pagination doesn't scale as the main table?
Your data in your secondary index is disperse pagination will only apply logic until it hits the page number since your data is not clustered by time you still have to sift through lots and lots of rows before you can find your first 10 for example .

Query Tracing do show pagination plays at the very late phase.

Why secondary index is slow?
First Cassandra reads the index table to retrieve the primary key of all matching rows and for each of them, it will read the original table to fetch out the data. It is known anti-patterns with low cardinality index. (reference https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive)

Cassandra query 2nd index with pagination become slower when data grow

1 Answers