0
votes

I'm trying to do a case on a DF I have but I'm getting an error. I want to implement this with built in spark functions - withcolumn, when, otherwise:

CASE WHEN vehicle="BMW" 
AND MODEL IN ("2020","2019","2018","2017") 
AND value> 100000 THEN 1
ELSE 0 END AS NEW_COLUMN

Currently I have this

DF.withColumn(NEW_COLUMN, when(col(vehicle) === "BMW" 
and col(model) isin(listOfYears:_*) 
and col(value) > 100000, 1).otherwise(0))

But I'm getting an error due to data type mismatch, (boolean and string)... I understand my condition returns booleans and strings, which is causing the error. What's the correct syntax for executing a case like that one? also, I was using && instead of and but the third && was giving me a "cannot resolve symbol &&"

Thanks for the help!

1
so NEW_COLUMN, vehicle, model etc are variables of type String? If so, this code runs fine. Do you have implicits imported? - Raphael Roth

1 Answers

1
votes

I think && is correct - with the built-in spark functions, all of the expressions are of type Column, checking the API it looks like && is correct and should work fine. Could it be as simple as an order-of-operations issue, where you need parentheses around each of the boolean conditions? The function / "operator" isin would have a lower precedence than &&, which might trip things up.