0
votes

I have currently deployed the nodes for Spark and Cassandra on google cloud. While the DataStax Spark Cassandra Connector is pretty much working fine locally. It is throwing errors in connection when trying the same on google cloud. I did try various permutations and combinations for running a simple value retrieval code from cassandra in Spark, but all in vain. The spark version deployed on gcloud is 1.1.0, while the cassandra version is 3.0.0. We made the assembly package using the same Spark version.

  def main(args: Array[String]): Unit = {

val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "104.197.133.174")
.set("spark.cassandra.auth.username", "cassandra")           
.set("spark.cassandra.auth.password", "xxxxxxx");
val sc = new SparkContext("local", "test", conf)
val user_table = sc.cassandraTable("demo", "movieslist")
val movie_index = user_table.map(r => new moviesugg(r.getInt("userid"), r.getString("movie")))

val file_collect= user_table.collect()
file_collect.map(println(_))

I am getting the error :-

Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {104.197.133.174}:9042
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:174)

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /104.197.133.174:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))

The table schema keyspaces have been defined correctly(working fine locally)and I think it is more of a connection issue. I am pretty new to Cassandra and I was wondering if there is any configuration changes which anyone could suggest, to be made on cassandra.yaml file to make the code work. However, I did try changing the rpc_address and listen_address, but it didn't help.

Any suggestions would be greatly appreciated.

2

2 Answers

1
votes

It looks like you're trying to run Cassandra on the public IP. As mentioned in another post, make sure that Cassandra is indeed bound to the public IP (spark is looking at port 9042).

Assuming that's true, you'll also need to open up a GCE firewall rule (https://cloud.google.com/compute/docs/networking?hl=en#firewalls) to allow TCP:9042 traffic. This will be required even if Spark is running on a separate host within GCE since you're using the public IP.

If both Cassandra and Spark are running on the same host, you can use the localhost address. Or, if they're both running on separate hosts in the same Google Cloud Project, you should be able to use the private 10.x address and have Cassandra bind to that private address.

0
votes

First check if the ports are actually opened. Second, given that you have very distant versions for both the systems, i.e. Spark (v1.1.0) and Cassandra(v3.0.0) this could be your main issue in this case. Please check this link for version compatibility:

https://github.com/datastax/spark-cassandra-connector#version-compatibility