Creating and using Spark-Hive UDF for Date

Question

Note: this question is linked from this question: Creting UDF function with NonPrimitive Data Type and using in Spark-sql Query: Scala

I have created a method in scala:

    package test.udf.demo
    object UDF_Class {
    def transformDate( dateColumn: String, df: DataFrame) : DataFrame = {
    val sparksession = SparkSession.builder().appName("App").getOrCreate()
    val d=df.withColumn("calculatedCol", month(to_date(from_unixtime(unix_timestamp(col(dateColumn),  "dd-MM-yyyy")))))
    df.withColumn("date1",  when(col("calculatedCol") === "01",  concat(concat(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol"), "dd-MM- yyyy"))))-1,  lit('-')),substring(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol")), "dd-MM- yyyy"))),3,4))
    .when(col("calculatedCol") ===  "02",concat(concat(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol"), "dd-MM- yyyy"))))-1,  lit('-')),substring(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol")), "dd-MM- yyyy"))),3,4)))
    .when(col("calculatedCol") ===  "03",concat(concat(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol"), "dd-MM- yyyy"))))-1,  lit('-')),substring(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol")), "dd-MM-yyyy"))),3,4)))
    .otherwise(concat(concat(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol"), "dd-MM-  yyyy")))), lit('-')), substring(year(to_date(from_unixtime(unix_timestamp(col("calculatedCol"), "dd-MM-yyyy")))) + 1, 3, 4))))) 
    val d1=sparksession.udf.register("transform",transformDate _)
    d
    }
    }

I want to use this transformDate method in my sparksql query which is separate scala code in same package.

    package test.udf.demo
    import test.udf.demo.transformDate
    //sparksession
    sparksession.sql("select id,name,salary,transform(dob) from dbname.tablename")

but I get an error

not a temp or permanent registered function in default database

Can someone please guide me?

QuickSilver QuickSilver · Accepted Answer · 2020-05-07T16:17:08

AFAIK Spark user defined udfs can cannot accept or return DataFrame. That is stopping your udf from registration

Creating and using Spark-Hive UDF for Date

2 Answers