how can we use Coalesce in python for multiple data frames using pandas

Question

How can we use Coalesce with multiple data frames.

columns_List = Emp_Id, Emp_Name, Dept_Id...

I have two data frames getting used in python script. df1[Columns_List] , df2[columns_List]. In both the dataframes i have same columns used but i will be having different values in both dataframes.

How can i use Coalesce so that lets say :In Dataframe df1[Columns_List] -- I have Emp_Name null then i want to pick Emp_Name from df2[Columns_list].

I am trying to create an output CSV file.

Please sorry if my framing of question is wrong..

Please find below sample data.

For Dataframe1 -- df1[Columns_List] .. Please find below output

EmpID,Emp_Name,Dept_id,DeptName
1,,1,
2,,2,

For Dataframe2 -- df2[Columns_List] .. Please find below output

EmpID,Emp_Name,Dept_id,DeptName
1,XXXXX,1,Sciece
2,YYYYY,2,Maths

I have source as Json file. Once i parse the data by python , i am using 2 dataframes in the same script. In Data frame 1 ( df1) i have Emp_Name & Dept_Name as null. In that case i want to pick data from Dataframe2 (df2).

In the above example i have provided few columns. But i may have n number of columns. but column ordering and column names will be always same. I am trying to achieve in such a way if any of the column from df1 is null then i want to pick value from df2.

Is that possible.. Please help me with any suggestionn...

@sammywemmy -- I have added some sample output data. Please let me know if i need to add any more details — Arya
Do you want to replace values which are Null or which are empty? By null I mean numpy.NaN, by empty could be an empty string. — Valentino
@Valentino .. Thanks a lot for the reply. Please consider it as empty.. — Arya

Valentino Valentino · Accepted Answer · 2019-06-30T15:17:51

You can use pandas.DataFrame.combine. This method does what you need: it builds a dataframe taking elements from two dataframes according to a custom function.
You can then write a custom function which picks the element from dataframe one unless that is null, in which case the element is taken from dataframe two.

Consider the two following dataframe. I built them according to your examples but with a small difference to emphatize that only emtpy string will be replaced:

columnlist = ["EmpID", "Emp_Name", "Dept_id", "DeptName"]

df1 = pd.DataFrame([[1, None, 1, np.NaN], [2, np.NaN, 2, None]], columns=columnlist)
df2 = pd.DataFrame([[1, "XXX", 2, "Science"], [2, "YYY", 3, "Math"]], columns=columnlist)

They are:

df1
   EmpID  Emp_Name  Dept_id  DeptName
0      1       NaN        1       NaN
1      2       NaN        2       NaN

df2
   EmpID Emp_Name  Dept_id DeptName
0      1      XXX        1  Science
1      2      YYY        3     Math

What you need to do is:

ddf = df1.combine(df2, lambda ss, rep_ss : pd.Series([r if pd.isna(x) else x for x, r in zip(ss, rep_ss)]))

to get ddf:

ddf
   EmpID Emp_Name  Dept_id DeptName
0      1      XXX        1  Science
1      2      YYY        2     Math

As you can see, only Null values in df1 have been replaced with the corresponding values in df2.

EDIT: A bit deeper explanation

Since I've been asked in the comments, let me give a bit of explanation more on the solution:

ddf = df1.combine(df2, lambda ss, rep_ss : pd.Series([r if pd.isna(x) else x for x, r in zip(ss, rep_ss)]))

Is a bit compact, but there is nothing much than some basic python techiques like list comprehensions, plus the use of pandas.DataFrame.combine. The pandas method is detailed in the docs I linked above. It compares the two dataframes column by column: the columns are passed to a custom function which must return a pandas.Series. This Series become a column in the returned dataframe.
In this case, the custom function is a lambda, which uses a list comprehension to loop over the pairs of elements (one from each column) and pick only one element of the pair (the first if not null, otherwise the second).

how can we use Coalesce in python for multiple data frames using pandas

3 Answers

EDIT: A bit deeper explanation