Spark SQL toDF method fails with java.lang.NoSuchMethodError

Question

Objective

Understand the cause and the solution to the problem. The problem happens when using spark-submit. Appreciate the help.

spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar

It does not cause an error when running line by line in a spark-shell.

...
scala>     val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
 |-- aucid: string (nullable = true)
 |-- bid: float (nullable = false)
 |-- bidtime: float (nullable = false)
 |-- bidder: string (nullable = true)
 |-- bidrate: integer (nullable = false)
 |-- openbid: float (nullable = false)
 |-- price: float (nullable = false)
 |-- itemtype: string (nullable = true)
 |-- dtl: integer (nullable = false)

Problem

Calling toDF method to convert RDD into DataFrame causes the error.

Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
    at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
    at AuctionDataFrame.main(AuctionDataFrame.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Code

case class Auctions(
  aucid: String,
  bid: Float,
  bidtime: Float,
  bidder: String,
  bidrate: Int,
  openbid: Float,
  price: Float,
  itemtype: String,
  dtl: Int)

object AuctionDataFrame {
  val AUCID = 0
  val BID = 1
  val BIDTIME = 2
  val BIDDER = 3
  val BIDRATE = 4
  val OPENBID = 5
  val PRICE = 6
  val ITEMTYPE = 7
  val DTL = 8

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("AuctionDataFrame")
    val sc = new SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._

    val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
    val auctionsRDD = inputRDD.map(a =>
      Auctions(
        a(AUCID),
        a(BID).toFloat,
        a(BIDTIME).toFloat,
        a(BIDDER),
        a(BIDRATE).toInt,
        a(OPENBID).toFloat,
        a(PRICE).toFloat,
        a(ITEMTYPE),
        a(DTL).toInt))
    val auctionsDF = auctionsRDD.toDF()  // <--- line 52 causing the error.
}

build.sbt

name := "Auction Project"

version := "1.0"

scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"

/* 
libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2",
    "org.apache.spark" %% "spark-sql" % "1.6.2",
    "org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)

Environment

Spark on Ubuntu 14.04:

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.2
      /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)

sbt on Windows:

D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12

Research

Looked into similar issues which suggest Scala version incompatibility that compiled Spark.

Hence changed the Scala version in build.sbt to 2.10 which created 2.10 jar, but the error persisted. Using % provided or not does not change the error.

scalaVersion := "2.10.6"

Possible duplicate of Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror — Tzach Zohar
Still looks like a version issue. Check carefully what version's being used everywhere... — The Archetypal Paul
@TzachZohar, thanks for the comment but I changed the Scala version to "2.10.6", run "sbt clean" and the "sbt package" again, which did not solve the issue. Could you be more specific how the linked article can solve? — mon

mon mon · Accepted Answer · 2016-09-15T04:12:44

Cause

The Spark 1.6.2 was compiled from source files with Scala 2.11. However the spark-1.6.2-bin-without-hadoop.tgz was downloaded and placed in lib/ directory.

I believe because the spark-1.6.2-bin-without-hadoop.tgz has been compiled with Scala 2.10, it cause the compatibility issue.

Fix

Remove the spark-1.6.2-bin-without-hadoop.tgz from the lib directory and run "sbt package" with library dependencies below.

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)