I just read the DataStax post "Basic Rules of Cassandra Data Modeling" and, to sum up, we should modeling our database schema by our queries and not by our relations/objects. So, many tables can have the same duplicated data, for example users_by_email
and users_by_username
which both have the same data.
How can I handle the object update ?
For example the user edit his email, do I UPDATE
both tables manually or only INSERT
the object with all columns and don't care about previous data (which are still in my database, but with a wrong column value => email).
In case of UPDATE
, how can I handle data synchronization ?
Currently, I'm doing it manually but is there a tool to help me ? Because, possibly, I can have 5 or 6 tables with different partition/clustering keys.
I heard that Hadoop can do it, or Apache Spark.