Spark Dataframe - Implement Oracle NVL Function while joining

Question

I need to implement NVL function in spark while joining two dataframes.

Input Dataframes :

ds1.show()
---------------
|key  | Code  |
---------------
|2    | DST   |
|3    | CPT   |
|null | DTS   |
|5    | KTP   |
---------------

ds2.show()
------------------
|key  | PremAmt |
------------------
|2     | 300   |
|-1    | -99   |
|5     | 567   |
------------------

Need to implement "LEFT JOIN NVL(DS1.key, -1) = DS2.key" . So I have written like this, but NVL or Coalesce function is missing .so it returned wrong values.

How to incorporate "NVL" in spark dataframes ?

// nvl function is missing, so wrong output
ds1.join(ds1,Seq("key"),"left_outer")

-------------------------
|key  | Code  |PremAmt  |
-------------------------
|2    | DST   |300      |
|3    | CPT   |null     |
|null | DTS   |null     |
|5    | KTP   |567      |
-------------------------

Expected Result :

-------------------------
|key  | Code  |PremAmt  |
-------------------------
|2    | DST   |300      |
|3    | CPT   |null     |
|null | DTS   |-99      |
|5    | KTP   |567      |
-------------------------

ds1.na.fill(-1, $"key").join(ds2 , Seq("key") , "leftouter") ? — philantrovert

jinyu0310 jinyu0310 · Accepted Answer · 2017-09-20T10:35:57

I know one complex way.

 val df = df1.join(df2, coalesce(df1("key"), lit(-1)) === df2("key"), "left_outer")

You should rename column name "key" of one df, and drop the column after join.

Spark Dataframe - Implement Oracle NVL Function while joining

4 Answers

An implementation of nvl in Scala

The answer is use NVL, this code in python works

Read CSV file

The ViewSuperstore can be ued for SQL NOW