UDF scala spark syntax

Question

I was trying to use UDF in spark, and noticed there are three different ways to declare UDF, from Scala syntax prespective what each of these declarations means, how does one UDF function can be accessed in three different ways,from a java developer point the last one is straight forward, but the previous two are not clear. I am bit confused now.

// You could define UDF this way
val upperUDF1 = udf { s: String => s.toUpperCase }

// or this way
val upperUDF2 = udf[String, String](_.toUpperCase)

//or even this way!
def upperUDF3 = udf((data: String) => data.toUpperCase )

Thanks @RameshMahrjan, After some reading I figured out we can use curly braces or parenthesis and they are interchangeable. So as I understand UDF function is defined to accept generic value so we can use type parameter to call it.

there is no difference between the first and the third one. and second one is different only as type parameters are used — Ramesh Maharjan

Michael Cherniavsky Michael Cherniavsky · Accepted Answer · 2018-11-05T19:10:20

You right about variety of ways, I prefer to use following one, that works well for me:

val removeBrackets = udf{(input_str:String) =>
    if(input_str != null && (input_str.contains("[") == true || input_str.contains("]") == true) ) {
      input_str.replaceAll("[\\[\\]]","")
    } else {
        input_str
    }
}

UDF scala spark syntax

1 Answers