
I'm learning spark/scala after just writing some MapReduce jobs.

I wrote some java beans to help me parse a file in HDFS and I want to use that to help speed up my progress in spark.

I've had success loading my file and creating an array of my java bean objects:

val input = sc.textFile("hdfs://host:port/user/test/path/out")
import my.package.Record
val clust_recs = clv_input.map(line => new my.package.Record(line))
clust_recs.map(rec => rec.getPremium()).stats()

But the last line creates this error:

<console>:46: error: could not find implicit value for parameter num: Numeric[Double]

I've tested that the values in this field are all valid, so I am pretty sure I don't have any null values that could be causing this error.

Here is an example of values:

val dblArray = clust_recs.map(rec => rec.getPremium()).filter(!isNaN(_))


res82: Array[Double] = Array(1250.6, 433.72, 567.07, 219.24, 310.32, 2173.48, 195.0, 697.94, 711.46, 42.718050000000005)

I'm at a loss to how to resolve this error and wondering if I should just abandon the concept of using a JavaBean object that I've already created.

Is my.package.Record a case class?Ramesh Maharjan
no, my.package.Record is a Java class (traditional java bean, with getters and setters)Jason Bowles
What's the signature of Record.getPremium()?Jacek Laskowski
What's the Spark version?Jacek Laskowski

2 Answers


You can only expect stats operator available on a RDD[T] by implicit conversions for RDD[Double] or RDD[T] where T can be converted to Numeric[T] (see the code):

implicit def doubleRDDToDoubleRDDFunctions(rdd: RDD[Double]): DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd)

implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
  : DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))

The implicit conversions are also mentioned in the scaladoc of DoubleRDDFunctions:

Extra functions available on RDDs of Doubles through an implicit conversion.

The point is that the following line does not give you RDD[Double], but something else.

clust_recs.map(rec => rec.getPremium())

That's the reason for the following compilation error:

error: could not find implicit value for parameter num: Numeric[Double]

The Scala compiler can't find Numeric[Double] implicit conversion that is called num.

implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
  : DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))

I can only guess that the Double is Java's java.lang.Double not Scala's Double and hence the compilation error.


Thanks for getting me in the right direction Jacek, your answer got me searching more about how to convert the java.lang.Double to scala.Double in the map function.

As a new user of scala, I'm struggling to get a handle on the differences with Java, especially implicit conversions.

I found this post very helpful: http://www.scala-archive.org/scala-Double-td1939353.html

and ultimately changed the code to this:

clust_recs.map(rec => rec.getPremium().doubleValue()).stats()


res28: org.apache.spark.util.StatCounter = (count: 1000000, mean: 170.636, stdev: 28.13, max: 2180.000000, min: 0.000000)