I'm doing a student project involving building and querying a Cassandra data cluster.
When my cluster load was light ( around 30GB ) my queries ran without a problem, but now that it's quite a bit bigger (1/2TB) my queries are timing out.
I thought that this problem might arise, so before I began generating and loading test data I had changed this value in my cassandra.yaml file:
request_timeout_in_ms (Default: 10000 ) The default timeout for other, miscellaneous operations.
However, when I changed that value to like 1000000, then cassandra seemingly hung on startup -- but that could've just been the large timeout at work.
My goal for data generation is 2TB. How do I query that large of space without running into timeouts?
queries :
SELECT huntpilotdn
FROM project.t1
WHERE (currentroutingreason, orignodeid, origspan,
origvideocap_bandwidth, datetimeorigination)
> (1,1,1,1,1)
AND (currentroutingreason, orignodeid, origspan,
origvideocap_bandwidth, datetimeorigination)
< (1000,1000,1000,1000,1000)
LIMIT 10000
ALLOW FILTERING;
SELECT destcause_location, destipaddr
FROM project.t2
WHERE datetimeorigination = 110
AND num >= 11612484378506
AND num <= 45880092667983
LIMIT 10000;
SELECT origdevicename, duration
FROM project.t3
WHERE destdevicename IN ('a','f', 'g')
LIMIT 10000
ALLOW FILTERING;
I have a demo keyspace with the same schemas, but a far smaller data size (~10GB) and these queries run just fine in that keyspace.
All these tables that are queried have millions of rows and around 30 columns in each row.