I have 3 tables in Cassandra clustered into several nodes, spark workers sitting on top of each of them. Lets call these tables A, B and C.
A and B are huge but they have same partition key, so that data locality is maintained when I am joining them together.
Now I want to join the other table C, which has different partition key, but not as big as the other two. I am also ready to replicate the table to all my nodes, if I have to.
How do I join them together, maintaining data locality with minimum shuffle?