0
votes

I'm learning spark/scala after just writing some MapReduce jobs.

I wrote some java beans to help me parse a file in HDFS and I want to use that to help speed up my progress in spark.

I've had success loading my file and creating an array of my java bean objects:

val input = sc.textFile("hdfs://host:port/user/test/path/out")
import my.package.Record
val clust_recs = clv_input.map(line => new my.package.Record(line))
clust_recs.map(rec => rec.getPremium()).stats()

But the last line creates this error:

<console>:46: error: could not find implicit value for parameter num: Numeric[Double]

I've tested that the values in this field are all valid, so I am pretty sure I don't have any null values that could be causing this error.

Here is an example of values:

val dblArray = clust_recs.map(rec => rec.getPremium()).filter(!isNaN(_))
dblArray.take(10)

OUTPUT:

res82: Array[Double] = Array(1250.6, 433.72, 567.07, 219.24, 310.32, 2173.48, 195.0, 697.94, 711.46, 42.718050000000005)

I'm at a loss to how to resolve this error and wondering if I should just abandon the concept of using a JavaBean object that I've already created.

2
Is my.package.Record a case class?Ramesh Maharjan
no, my.package.Record is a Java class (traditional java bean, with getters and setters)Jason Bowles
What's the signature of Record.getPremium()?Jacek Laskowski
What's the Spark version?Jacek Laskowski

2 Answers

0
votes

You can only expect stats operator available on a RDD[T] by implicit conversions for RDD[Double] or RDD[T] where T can be converted to Numeric[T] (see the code):

implicit def doubleRDDToDoubleRDDFunctions(rdd: RDD[Double]): DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd)
}

implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
  : DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
}

The implicit conversions are also mentioned in the scaladoc of DoubleRDDFunctions:

Extra functions available on RDDs of Doubles through an implicit conversion.

The point is that the following line does not give you RDD[Double], but something else.

clust_recs.map(rec => rec.getPremium())

That's the reason for the following compilation error:

error: could not find implicit value for parameter num: Numeric[Double]

The Scala compiler can't find Numeric[Double] implicit conversion that is called num.

implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
  : DoubleRDDFunctions = {
  new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
}

I can only guess that the Double is Java's java.lang.Double not Scala's Double and hence the compilation error.

0
votes

Thanks for getting me in the right direction Jacek, your answer got me searching more about how to convert the java.lang.Double to scala.Double in the map function.

As a new user of scala, I'm struggling to get a handle on the differences with Java, especially implicit conversions.

I found this post very helpful: http://www.scala-archive.org/scala-Double-td1939353.html

and ultimately changed the code to this:

clust_recs.map(rec => rec.getPremium().doubleValue()).stats()

OUTPUT:

res28: org.apache.spark.util.StatCounter = (count: 1000000, mean: 170.636, stdev: 28.13, max: 2180.000000, min: 0.000000)