how to add new column to data frame in spark by deriving edit distance data frame columns (String)

Question

I am new to Scala and Spark. I want to derive a new column from existing columns of data frame by computing edit distance. For example FNAME and LNAME are two columns of data frame, wanted to add new column called NAMESCORE which keeps edit distance of FNAME to LNAME. Kindly please advise with a working or pseudo code.

Here is the link I got some partial answer.

Derive multiple columns from a single column in a Spark DataFrame

Himaprasoon Himaprasoon · Accepted Answer · 2016-03-22T07:04:32

You can use UDF:

def udfToFindEditDistance(col1 :String,col2 :String): String ={
    //find edit distance b/w col1 and col2 
  }

Register the udf

 val newUDF=udf(udfToFindEditDistance(_:String,_:String))

Adding a new column

val newDf=df.withColumn("newColumnName",newUDF(df("FNAME"),df("LNAME")))

how to add new column to data frame in spark by deriving edit distance data frame columns (String)

2 Answers