0
votes

I need to update value of dataframe column based on a string that isn't part of any other column in the dataframe. How do I do this?

For e.g. Let's say my dataframe has column A, B, C. I want to update value of column C based on combination of value in column A & a static string. I tried to do the following.

val df = originalDF.withColumn("C", Helper.dudf(df("A"), lit("str")))

My helper class as following

val addDummyColumn :(String, String)=>String=(input:String, recordType: String)=>{input}

val dummyUDF = udf(addDummyColumn)

My UDF that takes in variable A & recordType:

if(recordType.equals("TRANSACTION") {
 if(A > 0 ) return "CHARGE";
   else return "REFUND"
} else if (recordType.equals("CHARGEBACK") {
    return "CHARGEBACK"
}

Example Input & Output:

Sample Input:
A=10, recordType=TRANSACTION
Output: C = CHARGE
A=-10, recordType=TRANSACTION
C = REFUND

A=10, recordType=CHARGEBACK
C = CHARGEBACK

My problem is that withColumn only accepts Column so I did lit("str") but I don't know how to extract value of that column in my UDF. Ideas?

2
Can you add sample input and expected output?philantrovert
can you elaborate what do you mean by I want to update value of column C based on combination of value in column A & a static string with an example?Ramesh Maharjan
I've updated the question with an example.Nick01
Do you also pass the recordType as lit("TRANSACTION") or is this another columnkoiralo
@ShankarKoirala I pass it as lit("TRANSACTION")Nick01

2 Answers

1
votes

If column A is a IntegerType then you can define the udf function as

val recordType: String = //"TRANSACTION" or "CHARGEBACK"
import org.apache.spark.sql.functions._
val dummyUDF = udf((A: Int, recordType: String) => {
  if(recordType.equals("TRANSACTION")){
    if(A > 0) "CHARGE" else "REFUND"
  } else if (recordType.equals("CHARGEBACK"))
    "CHARGEBACK"
  else
    "not known"
})

val df = originalDF.withColumn("C", dummyUDF(originalDF("A"), lit(recordType)))
1
votes

This is how you can use udf and pass the columns and static strings

 val addDummy = udf((A : String, recordType: String) => {
if(recordType.equals("TRANSACTION")) {
  if(A.toInt > 0 ) 
    "CHARGE" 
  else
    "REFUND"
}else if (recordType.equals("CHARGEBACK")) {
  "CHARGEBACK"
}else
  "NONE"
  })

Now call the udf as below

val newDF = df.withColumn("newCol", addDummy($"A", lit("TRANSACTION")))

Hope this helps!