I have a following spark dataframe where all the columns (except for primary key column emp_id) consist of a map (with keys 'from' and 'to' which can have null values). I want to evaluate 'from' and 'to' of each column(except emp_id) and add a new key to the map(named 'change') which has a value of a) 'insert' if 'from' value is null and 'to' is not null b) 'delete' if 'to' value is null and 'from' is not null b) 'update' if 'from' and 'to' are not null & 'from' value is different from 'to' value
Note: columns which have null value will remain untouched.
Important Note: The type of these columns is not Map[String, String] but instead something lie Map[String, Any] meaning the value can be of other struct objects
How can we achieve this in Scala.
|emp_id|emp_city |emp_name |emp_phone |emp_sal |emp_site |
|1 |null |[from -> Will, to -> Watson]|null |[from -> 1000, to -> 8000]|[from ->, to -> Seattle] |
|3 |null |[from -> Norman, to -> Nate]|null |[from -> 1000, to -> 8000]|[from -> CherryHill, to -> Newark]|
|4 |[from ->, to -> Iowa]|[from ->, to -> Ian] |[from ->, to -> 1004]|[from ->, to -> 8000] |[from ->, to -> Des Moines] |
Expected:
|emp_id|emp_city |emp_name |emp_phone |emp_sal |emp_site |
|1 |null |[from -> Will, to -> Watson, change -> update]|null |[from -> 1000, to -> 8000, change -> update]|[from ->, to -> Seattle, change -> insert] |
|3 |null |[from -> Norman, to -> Nate, change -> update]|null |[from -> 1000, to -> 8000, change -> update]|[from -> CherryHill, to -> Newark, change -> update]|
|4 |[from ->, to -> Iowa, change -> insert]|[from ->, to -> Ian, change -> insert] |[from ->, to -> 1004, change -> insert]|[from ->, to -> 8000, change -> insert] |[from ->, to -> Des Moines, change -> insert] |
Any / Map[String, Any]
type in Dataframe. Please check again. – QuickSilver