Objective
Understand the cause and the solution to the problem. The problem happens when using spark-submit. Appreciate the help.
spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar
It does not cause an error when running line by line in a spark-shell.
...
scala> val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
|-- aucid: string (nullable = true)
|-- bid: float (nullable = false)
|-- bidtime: float (nullable = false)
|-- bidder: string (nullable = true)
|-- bidrate: integer (nullable = false)
|-- openbid: float (nullable = false)
|-- price: float (nullable = false)
|-- itemtype: string (nullable = true)
|-- dtl: integer (nullable = false)
Problem
Calling toDF method to convert RDD into DataFrame causes the error.
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
at AuctionDataFrame.main(AuctionDataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Code
case class Auctions(
aucid: String,
bid: Float,
bidtime: Float,
bidder: String,
bidrate: Int,
openbid: Float,
price: Float,
itemtype: String,
dtl: Int)
object AuctionDataFrame {
val AUCID = 0
val BID = 1
val BIDTIME = 2
val BIDDER = 3
val BIDRATE = 4
val OPENBID = 5
val PRICE = 6
val ITEMTYPE = 7
val DTL = 8
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("AuctionDataFrame")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
val auctionsRDD = inputRDD.map(a =>
Auctions(
a(AUCID),
a(BID).toFloat,
a(BIDTIME).toFloat,
a(BIDDER),
a(BIDRATE).toInt,
a(OPENBID).toFloat,
a(PRICE).toFloat,
a(ITEMTYPE),
a(DTL).toInt))
val auctionsDF = auctionsRDD.toDF() // <--- line 52 causing the error.
}
build.sbt
name := "Auction Project"
version := "1.0"
scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"
/*
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.spark" %% "spark-sql" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
Environment
Spark on Ubuntu 14.04:
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
sbt on Windows:
D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12
Research
Looked into similar issues which suggest Scala version incompatibility that compiled Spark.
Hence changed the Scala version in build.sbt to 2.10 which created 2.10 jar, but the error persisted. Using % provided or not does not change the error.
scalaVersion := "2.10.6"