1
votes

I have a schema and name of columns to apply UDF to. Name of columns are user input and they can differ in numbers for each input. Is there a way to apply UDFs to N columns in dataframe ?

Trying to achieve this. for schema with say col1,col2,col3,col4,col5

  DataFrame newDF = df.withColumn("col2", callUDF("test", (df.col("col2"))));
  or 
  DataFrame newDF = df.withColumn("col2", callUDF("test", (df.col("col2"))))
                 .withColumn("col3", callUDF("test", (df.col("col3"))));
  or
   DataFrame newDF = df.withColumn("col2", callUDF("test", (df.col("col1"))))
                 .withColumn("col3", callUDF("test", (df.col("col3"))))
                 .withColumn("col5", callUDF("test", (df.col("col5"))))
  or for N columns.

Any ideas ?

1

1 Answers

0
votes

I ended up writing code to dynamically generate SPARK SQL query for applying UDFs to 1 to N cols. Then register input dataframe as temp table and use genererated query.