I have this use case where I would need to constantly listen to a kafka topic and write to 2000 column families(15 columns each.. time series data) based on a column value from a Spark streaming App. I have a local Cassandra installation set up. Creating these column families takes around 1.5 hrs on a CentOS VM using 3 cores and and 12 gigs of ram. In my spark streaming app I'm doing some preprocessing for storing these stream events to Cassandra. I'm running into issues with the amount of time it takes for my streaming app to complete this.
I was trying to save 300 events to multiple column families(roughly 200-250) based on key for this my app takes around 10 minutes to save them. This seems to be strange as printing these events to screen grouped by key takes less than a minute, but only when I am saving them to Cassandra it takes time.
I have had no issues saving records in the order of 3 million to Cassandra . It took less than 3 minutes(but this was to a single column family in Cassandra).
My requirement is to be as real-time as possible and this seems to be nowhere close. Production environment would have roughly 400 events every 3 seconds.
Is there any tuning that i need to do With the YAML file in Cassandra or any changes to cassandra-connector itself
INFO 05:25:14 system_traces.events 0,0
WARN 05:25:14 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:14 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO 05:25:16 ParNew GC in 340ms. CMS Old Gen: 1308020680 -> 1454559048; Par Eden Space: 251658240 -> 0;
WARN 05:25:16 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:16 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO 05:25:17 ParNew GC in 370ms. CMS Old Gen: 1498825040 -> 1669094840; Par Eden Space: 251658240 -> 0;
WARN 05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:18 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO 05:25:19 ParNew GC in 382ms. CMS Old Gen: 1714792864 -> 1875460032; Par Eden Space: 251658240 -> 0;
W