3
votes

I am having issues running Apache Spark 1.0.1 within a Play! app. Currently, I am trying to run Spark within the Play! application and use some of the basic Machine Learning within Spark.

Here's my app creation:

  def sparkFactory: SparkContext = {
    val logFile = "public/README.md" // Should be some file on your system
    val driverHost = "localhost"
    val conf = new SparkConf(false) // skip loading external settings
      .setMaster("local[4]") // run locally with enough threads
      .setAppName("firstSparkApp")
      .set("spark.logConf", "true")
      .set("spark.driver.host", s"$driverHost")
    new SparkContext(conf)
  }

And here's an error when I try to do some basic discovery of a Tall and Skinny Matrix:

[error] o.a.s.e.ExecutorUncaughtExceptionHandler - Uncaught exception in thread Thread[Executor task launch worker-3,5,main]
java.lang.NoSuchMethodError: breeze.linalg.DenseVector$.dv_v_ZeroIdempotent_InPlaceOp_Double_OpAdd()Lbreeze/linalg/operators/BinaryUpdateRegistry;
    at org.apache.spark.mllib.linalg.distributed.RowMatrix$$anonfun$5.apply(RowMatrix.scala:313) ~[spark-mllib_2.10-1.0.1.jar:1.0.1]
    at org.apache.spark.mllib.linalg.distributed.RowMatrix$$anonfun$5.apply(RowMatrix.scala:313) ~[spark-mllib_2.10-1.0.1.jar:1.0.1]
    at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) ~[scala-library-2.10.4.jar:na]
    at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) ~[scala-library-2.10.4.jar:na]
    at scala.collection.Iterator$class.foreach(Iterator.scala:727) ~[scala-library-2.10.4.jar:na]
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ~[scala-library-2.10.4.jar:na]

The error above is triggered by the following:

  def computePrincipalComponents(datasetId: String) = Action {
    val datapoints = DataPoint.listByDataset(datasetId)

    // load the data into spark
    val rows = datapoints.map(_.data).map { row =>
      row.map(_.toDouble)
    }
    val RDDRows = WorkingSpark.context.makeRDD(rows).map { line =>
      Vectors.dense(line)
    }

    val mat = new RowMatrix(RDDRows)
    val result = mat.computePrincipalComponents(mat.numCols().toInt)


    Ok(result.toString)
  }

It looks like a dependency issue, but no idea where it starts. Any ideas?

1

1 Answers

2
votes

Ah this was indeed caused by a dependency conflict. Apparently the new Spark uses new Breeze methods that were not available in a version I had pulled in. By removing Breeze from my Play! Build file I was able to run the function above just fine.

For those interested, here's the output:

-0.23490049167080018  0.4371989078912155    0.5344916752692394    ... (6 total)
-0.43624389448418854  0.531880914138611     0.1854269324452522    ...
-0.5312372137092107   0.17954211389001487   -0.456583286485726    ...
-0.5172743086226219   -0.2726152326516076   -0.36740474569706394  ...
-0.3996400343756039   -0.5147253632175663   0.303449047782936     ...
-0.21216780828347453  -0.39301803119012546  0.4943679121187219    ...