0
votes

How can I get a higher order function in Scala to properly accept a spark filter predicate? I.e.

val df = Seq(1,2,3,4).toDF("value")

df.filter(col("value")> 2).show
df.filter(col("value")< 2).show

works just fine. But when I try to refactor it into a function which accepts a filter predicate (note: same signature as > operator) the compiler no longer finds the left/right part to submit to predicate.

def myFilter =(predicate:Any =>Column)(df:DataFrame)= {
df.filter(col("value") predicate 2).show // WARN this does not compile
}

df.transform(myFilter(>)).show

How can this be made to work?

1
In Scala a x b means a.x(b), so col("value") predicate 2 means col("value").predicate(2) which is not what you want. Perhaps something more like predicate(col("value"), 2)? - Tim
no, this is still not enough. df.filter(col("value").predicate(2)) also does not work. - Georg Heiler
You need to change it to predicate(col("value"), 2). You also need to change the signature of predicate. I don't know enough about Spark to give the full answer (which is why I am using comments) but it should be something like (Column, Int) => Boolean. Then you pass _ > _ rather than just > to myFilter. - Tim

1 Answers

2
votes

Combining the various comments gives this as a possible solution:

def myFilter = (predicate: (Column, Int) => Column)(df: DataFrame) = {
  df.filter(predicate(col("value"), 2))
}

df.transform(myFilter(_ > _)).show