Similar question as here, but don't have enough points to comment there.
According to the latest Spark documentation an udf
can be used in two different ways, one with SQL and another with a DataFrame. I found multiple examples of how to use an udf
with sql, but have not been able to find any on how to use a udf
directly on a DataFrame.
The solution provided by the o.p. on the question linked above uses __callUDF()__
which is _deprecated_
and will be removed in Spark 2.0 according to the Spark Java API documentation. There, it says:
"since it's redundant with udf()"
so this means I should be able to use __udf()__
to cal a my udf
, but I can't figure out how to do that. I have not stumbled on anything that spells out the syntax for Java-Spark programs. What am I missing?
import org.apache.spark.sql.api.java.UDF1;
.
.
UDF1 mode = new UDF1<String[], String>() {
public String call(final String[] types) throws Exception {
return types[0];
}
};
sqlContext.udf().register("mode", mode, DataTypes.StringType);
df.???????? how do I call my udf (mode) on a given column of my DataFrame df?