Cassandra Spark job submission

Question

I am a relative newbie to spark/cassandra. As such I have a basic question. I have compiled an uber jar and loaded it to my spark/cassandra server. Now I am in a pickle, how do I run it via the cassandra (DSE) enviornment? I know the spark shell command is "dse spark-submit" but when I try to do a "dse spark-submit" I get a "NullPointerException"

Here is the full output:

Exception in thread "main" java.lang.NullPointerException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The program code is very basic and has been proven to work in the spark shell package xxx.seaoxxxx

import com.datastax.spark.connector._
import org.apache.spark.{SparkConf, SparkContext}


class test {
  def main(args: Array[String]){
    val conf = new SparkConf(true).set("spark.cassandra.connection.host", "xx.xxx.xx.xx")
      .setAppName("Seasonality")

    val sc = new SparkContext("spark://xx.xxx.xx.xx:7077", "Season", conf)

    val ks = "loadset"
    val incf =  "period"

    val rdd = sc.cassandraTable(ks, incf)
    rdd.count
    println("done with test")
    sc.stop()
  }
}

The spark-submit code is as follows: dse spark-submit \ --class xxx.seaoxxxx.test \ --master spark://xxx.xx.x.xxx:7077 \ /home/ubuntu/spark/Seasonality_v6-assembly-1.0.1.jar 100

Thanks,

Eric

You do seem to be trying to set the name parameter twice although i'm not sure if this will break everything. I would try initializing all of the configuration in the conf object and see if that resolves it. Also you shouldn't have to set the master var if you are using dse spark-submit. — RussS

catpaws catpaws · Accepted Answer · 2014-11-09T15:01:38

The current release, DataStax Enterprise 4.5, supports dse spark-class instead of dse spark-submit: http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkStart.html?scroll=sparkStart__spkShrkLaunch

Cassandra Spark job submission

1 Answers