Summary
Write conflicts are something you shouldn't need to worry about. All INSERTS/UPDATES/DELETES are Upserts in Cassandra. Everything in Cassandra is column based.
Cassandra uses a last-write wins strategy to manage conflict. As you can see by be example below, whenever you change a value the timestamp associated with that column is updated. Since you are running concurrent updates, and one thread will update col_a and another will update col_b.
Example
Initial Insert
cqlsh:test_keyspace> insert into race_condition_test (pk, col_a, col_b ) VALUES ( '1', 'deckard', 'Blade Runner');
cqlsh:test_keyspace> select * from race_condition_test ;
pk | col_a | col_b
----+---------+--------------
1 | deckard | Blade Runner
(1 rows)
Timestamps are the same in the initial insert
cqlsh:test_keyspace> select pk, col_a, writetime(col_a), col_b, writetime(col_b) from race_condition_test ;
pk | col_a | writetime(col_a) | col_b | writetime(col_b)
----+---------+------------------+--------------+------------------
1 | Deckard | 1526916970412357 | Blade Runner | 1526916970412357
(1 rows)
Once col_b is uptated, it's timestamp changes to reflect the change.
cqlsh:test_keyspace> insert into race_condition_test (pk, col_b ) VALUES ( '1', 'Rick');
cqlsh:test_keyspace> select pk, col_a, writetime(col_a), col_b, writetime(col_b) from race_condition_test ;
pk | col_a | writetime(col_a) | col_b | writetime(col_b)
----+---------+------------------+-------+------------------
1 | Deckard | 1526916970412357 | Rick | 1526917272641682
(1 rows)
After col_a is updated it too get's its timestamp updated to the new value
cqlsh:test_keyspace> insert into race_condition_test (pk, col_a) VALUES ( '1', 'bounty hunter');
cqlsh:test_keyspace> select pk, col_a, writetime(col_a), col_b, writetime(col_b) from race_condition_test ;
pk | col_a | writetime(col_a) | col_b | writetime(col_b)
----+---------------+------------------+-------+------------------
1 | bounty hunter | 1526917323082217 | Rick | 1526917272641682
(1 rows)
Recommendation
My recommendation is that you use one single table that serves your query needs. If you need to query by pk, then create one single table with all columns you need. This way you will have a single wide row that can be read back efficiently, as part of a single query.
The datamodel you describe in option 2 is a bit to relational, and is not optimal for Cassandra. You cannot perform joins natively in cassandra and you should avoid preforming joins on the client side.
Data Mode Rules:
Rule 1: Spread data Evenly across the cluster
You will want to create a partition key that will ensure the data is evenly distributed across the cluster and you don't have any hotspots.
Rule 2: Minimize the number of partitions Read
Each partition may reside in different nodes, so you should try to create a scenario where your queries go ideally to only one node for performance sake.
Rule 3: Model around your queries
- Determine what queries to support
- Create a table that satisfies your query (meaning that you should use one table per query pattern).
- If you need to support more query patterns, then denormalize your data into additional tables that serve those queries. Avoid Secondary Indexes and Materialized Views, as they are not stable at the moment and the first one can create major performance issues when you start to increase your cluster.
If you want to read a little bit more about this I suggest this datastax page:
Basic Rules of Cassandra Data Modeling