2
votes

I have two pandas dataframes as follows.

list1 = [{'salt': 0.2, 'fat': 0.8}, {'fat': 1.0, 'protein': 0.9}]
df1 = pd.DataFrame(line2)
# Fill missing values with zeros
df1.fillna(0, inplace=True)

list2 = [{'salt': 0.1, 'sugar': 0.9}, {'oil': 0.9, 'sugar': 0.8, 'salt': 0.2}, {'protein': 0.9}]
df2 = pd.DataFrame(line2)
# Fill missing values with zeros
df2.fillna(0, inplace=True)

My two data frames look as follows.

df1:
   fat  protein  salt
0  0.8      0.0   0.2
1  1.0      0.9   0.0

df2:
   oil  protein  salt  sugar
0  0.0      0.0   0.1    0.9
1  0.9      0.0   0.2    0.8
2  0.0      0.9   0.0    0.0

Now I want to compare df1 and df2 to find missing topics and fill them with zero, so that the final version of the dataframes looks as follows.

df1:
   fat  protein  salt  oil  sugar
0  0.8      0.0   0.2   0    0
1  1.0      0.9   0.0   0    0

df2:
   oil  protein  salt  sugar  fat
0  0.0      0.0   0.1    0.9   0
1  0.9      0.0   0.2    0.8   0
2  0.0      0.9   0.0    0.0   0

I know to do this within a dataframe using df1.fillna(0, inplace=True). But with two dataframes, how can we do it?

2

2 Answers

6
votes

Use pd.DataFrame.align making sure to only align along the column axis. Use argument fill_value=0 to fill in missing elements with zero.

df1, df2 = df1.align(df2, axis=1, fill_value=0)

df1

   fat  oil  protein  salt  sugar
0  0.8    0      0.0   0.2      0
1  1.0    0      0.9   0.0      0

df2

   fat  oil  protein  salt  sugar
0    0  0.0      0.0   0.1    0.9
1    0  0.9      0.0   0.2    0.8
2    0  0.0      0.9   0.0    0.0
3
votes

Using df.reindex. Not as elegant, still going to post anyway, since piR hasn't given you as many options this time!

c = df1.columns | df2.columns
df1 = df1.reindex(columns=c).fillna(0)
df2 = df2.reindex(columns=c).fillna(0)

df1

   fat  oil  protein  salt  sugar
0  0.8  0.0      0.0   0.2    0.0
1  1.0  0.0      0.9   0.0    0.0


df2

   fat  oil  protein  salt  sugar
0  0.0  0.0      0.0   0.1    0.9
1  0.0  0.9      0.0   0.2    0.8
2  0.0  0.0      0.9   0.0    0.0