reduceByKey after mapValues with parameterized type doesn't compile

Question

When I call RDD.mapValues(...).reduceByKey(...) my code does not compile. But when I reverse the order, RDD.reduceByKey(...).mapValues(...), the code does compile. The types appear to match up.

A complete minimal reproducing example is:

def test[E]() =
    new SparkContext().textFile("")
        .keyBy(_ ⇒ 0L)
        .mapValues(_.asInstanceOf[E])
        .reduceByKey((x, _) ⇒ x)

The compilation error is the same as in this question but its remedy doesn't help:

Test.scala:7: error: value reduceByKey is not a member of org.apache.spark.rdd.RDD[(Long, E)]
possible cause: maybe a semicolon is missing before `value reduceByKey'?
            .reduceByKey((x, _) ⇒ x)

This issue seems to more at the Scala level than Spark. Replacing the type parameter with Int works so it might be a problem with type inference. I am using Spark 2.2.0 with Scala 2.11.

Joe K Joe K · Accepted Answer · 2018-03-21T17:09:24

Methods such as .reduceByKey and .mapValues are members of PairRDDFunctions but you can call them because there is an implicit conversion from RDD[(K, V)]. But if you look closely at the definition of that conversion, you might spot the problem:

implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
    (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V]

It requires a ClassTag instance for the K and V types. In your example, none is available for E, so the implicit conversion can't be applied, so it doesn't find the reduceByKey method. Try this:

def test[E]()(implicit et: ClassTag[E]) = ...

or the shorthand:

def test[E : ClassTag]() = ...

reduceByKey after mapValues with parameterized type doesn't compile

1 Answers