0
votes

I am trying to execute a simple app example code with spark. Executing the job using spark submit. spark-submit --class "SimpleJob" --master spark://:7077 target/scala-2.10/simple-project_2.10-1.0.jar

15/03/08 23:21:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/08 23:21:53 WARN LoadSnappy: Snappy native library not loaded
15/03/08 23:22:09 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Lines with a: 21, Lines with b: 21

The job gives correct results but gives following errors below it:

15/03/08 23:22:28 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(<worker-host.domain.com>,53628)
java.nio.channels.ClosedChannelException
    at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
    at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
    at org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:205)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/03/08 23:22:28 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(<worker-host.domain.com>,53628) not found
15/03/08 23:22:28 WARN ConnectionManager: All connections not cleaned up

Following is the spark-defaults.conf

spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              5g
spark.master                     spark://<master-ip>:7077
spark.eventLog.enabled           true
spark.executor.extraClassPath   $SPARK-HOME/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar
spark.cassandra.connection.conf.factory com.datastax.spark.connector.cql.DefaultConnectionFactory
spark.cassandra.auth.conf.factory       com.datastax.spark.connector.cql.DefaultAuthConfFactory
spark.cassandra.query.retry.count       10

Following is the spark-env.sh

SPARK_LOCAL_IP=<master-ip in master worker-ip in workers>
SPARK_MASTER_HOST='<master-hostname>'
SPARK_MASTER_IP=<master-ip>
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=4
1

1 Answers

0
votes

Got an answer to this,

Even though i am adding the cassandra connector to the class path by the command, i am not sending the same path to all nodes of cluster.

Now i am using below command sequence to do it properly

spark-shell --driver-class-path ~/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._
sc.addJar("~/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar")

After these commands I am able to run all the read & write into my cassandra cluster properly using the spark RDDs.