I have a DSE cluster where every node in the cluster has both spark and Cassandra running.
When I load data from Cassandra to spark Rdd and do some action on the rdd, i know the data would be distributed into multi nodes. In my case, I want to write these rdds from every node to its local Cassandra dB table directly, is there anyway to do it.
If I do normal rdd collect, all data from spark nodes would be merged and go back to node with driver. I do not want this to happen as the data flow from nodes back to driver node may take Long time, I want the data been save to local node directly to avoid the data movement across the spark nodes.