We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra):
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.SparkConf
object VMProcessProject {
def main(args: Array[String]) {
val conf = new SparkConf()
.set("spark.cassandra.connection.host", "127.0.0.1")
.set("spark.executor.extraClassPath", "C:\\Users\\SNCUser\\dataquest\\ScalaProjects\\lib\\spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar")
println("got config")
val sc = new SparkContext("spark://US-L15-0027:7077", "test", conf)
println("Got spark context")
val rdd = sc.cassandraTable("test_ks", "test_col")
println("Got RDDs")
println(rdd.count())
val newRDD = rdd.map(x => 1)
val count1 = newRDD.reduce((x, y) => x + y)
}
}
We do not have a build.sbt file, instead putting jars into a lib folder and source files in the src/main/scala directory and running with sbt run. Our assembly.sbt file looks as follows:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
When we run sbt assembly we get the following error message:
...
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: java heap space
at java.util.concurrent...
We're not sure how to change the jvm settings to increase the memory since we are using sbt assembly to make the jar. Also, if there is something egregiously wrong with how we are writing the code or building our project that'd help us out a lot too; there's been so many headaches trying to set up a basic spark program!