Spark Error Running "stats()": could not find implicit value for parameter num: Numeric[Double]

Question

I'm learning spark/scala after just writing some MapReduce jobs.

I wrote some java beans to help me parse a file in HDFS and I want to use that to help speed up my progress in spark.

I've had success loading my file and creating an array of my java bean objects:

val input = sc.textFile("hdfs://host:port/user/test/path/out")
import my.package.Record
val clust_recs = clv_input.map(line => new my.package.Record(line))
clust_recs.map(rec => rec.getPremium()).stats()

But the last line creates this error:

<console>:46: error: could not find implicit value for parameter num: Numeric[Double]

I've tested that the values in this field are all valid, so I am pretty sure I don't have any null values that could be causing this error.

Here is an example of values:

val dblArray = clust_recs.map(rec => rec.getPremium()).filter(!isNaN(_))
dblArray.take(10)

OUTPUT:

res82: Array[Double] = Array(1250.6, 433.72, 567.07, 219.24, 310.32, 2173.48, 195.0, 697.94, 711.46, 42.718050000000005)

I'm at a loss to how to resolve this error and wondering if I should just abandon the concept of using a JavaBean object that I've already created.

no, my.package.Record is a Java class (traditional java bean, with getters and setters) — Jason Bowles

Jacek Laskowski Jacek Laskowski · Accepted Answer · 2017-06-07T20:24:25

You can only expect stats operator available on a RDD[T] by implicit conversions for RDD[Double] or RDD[T] where T can be converted to Numeric[T] (see the code):

implicit def doubleRDDToDoubleRDDFunctions(rdd: RDD[Double]): DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd)
}

implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
  : DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
}

The implicit conversions are also mentioned in the scaladoc of DoubleRDDFunctions:

Extra functions available on RDDs of Doubles through an implicit conversion.

The point is that the following line does not give you RDD[Double], but something else.

clust_recs.map(rec => rec.getPremium())

That's the reason for the following compilation error:

error: could not find implicit value for parameter num: Numeric[Double]

The Scala compiler can't find Numeric[Double] implicit conversion that is called num.

implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
  : DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
}

I can only guess that the Double is Java's java.lang.Double not Scala's Double and hence the compilation error.

Spark Error Running "stats()": could not find implicit value for parameter num: Numeric[Double]

2 Answers