1
votes

I run Spark in Standalone mode ,Now I wanted to use data to process But I must Copy in all of the nodes in The same path.Now I decide to Use cassandra file system(CFS) to share data between all the nodes. But how do I run my spark job to use cassandra keyspace/table data in another node? How do I make Cassandra tables accessible by all the nodes?

1
Can you please elaborate what are you trying to achieve? Is it persisting data into cassandra database with spark jobs? Also, being a distributed database, cassandra by nature share data between all nodes based on the replication factor. - Arka Ghosh
I have 3 cassandra nodes(machines),I want to reading data in spark with sc.cassandraTable("kv", "tb") and now ,How do I set sparkconf? new SparkConf(true) .set("spark.cassandra.connection.host", "which node ip") Which cassandra Ip must be replaced? - Hamid
Use all the three IPs separated by comma. - Arka Ghosh

1 Answers

0
votes

You should give a comma separated list of initial contact points . The connector will read the metadata of cluster to find all the nodes in the cassandra cluster.

val conf = new SparkConf(true)
    .set("spark.cassandra.connection.host", "192.168.123.10,192.168.123.110")

refer for parameter details - spark cassandra connector doc