Hi I am using datastax enterprise for hadoop and cassandra integration. I have configured 3 cassandra nodes and 2 analytics node(On which hive will run).
So I am confused if there is some data which is not present on hive nodes but on cassandra nodes, will it not be processed during map reduce or map reduce will pull the data from cassandra nodes and run the map reduce. Please help
So I have 4 machines (replication factor 3)
machine 1) cassandra node|token value=0 |data owned(25%)
machine 2)-cassandra node|token value=2^127*.5 |data owned(33%)
machine 3)-analytics node|token value=2^127*.25 |data owned(33%)
machine 4) analytics node|token value=2^127*.75 |data owned(8%)
shouldn't they be owning 25% each Also I now think that data will be replicated in all nodes not in just 3 nodes