I am starting with an initial idea of rewriting mammoth spark-kafka-hbase application with spark-kafka-cassandra(on kubernetes).
I have the following data models one supports all-time inserts and other one supports upserts
Approach 1:
create table test.inv_positions(
location_id int,
item bigint,
time_id timestamp,
sales_floor_qty int,
backroom_qty int,
in_backroom boolean,
transit_qty int,
primary key ((location_id), item,time_id) ) with clustering order by (item asc,time_id DESC);
This table keeps inserting as timeid is part of clustering col. I am thinking to read latest (timeid is desc) by fetch 1 and somehow delete the old record by either setting TTL on key cols or delete them overnight.
Concerns: TTL or delete the old records creates tombstones.
Approach 2:
create table test.inv_positions(
location_id int,
item bigint, time_id timestamp,
sales_floor_qty int,
backroom_qty int,
in_backroom boolean,
transit_qty int,
primary key ((location_id), item) ) with clustering order by (item asc);
This table if a new record comes for the same location and item, it upserts it. Its easy to read and no need to worry about purging old records
Concerns : I have another application on Cassandra that updates different col at different time and we still have read issues. That said, upserts also creates tombstones but how worse compared to approach 1? or any other better way to modeling it right?