3
votes

I need to subtract two Data Frames with different indexes (which causes 'NaN' values when one of the values is missing) and I want to replace the missing values from each Data Frame with different number (fill value). For example, let's say I have df1 and df2:

df1:

    A   B   C
0   0   3   0
1   0   0   4
2   4   0   2

df2:

    A   B   C
0   0   3   0
1   1   2   0
3   1   2   0

subtracted = df1.sub(df2):

    A   B   C
0   0   0   0
1   -1  -2  4
2   NaN NaN NaN
3   NaN NaN NaN

I want the second row of subtracted to have the values from the second row in df1 and the third row of subtracted to have the value 5.

I expect -

subtracted:

    A   B   C
0   0   0   0
1   -1  -2  4
2   4   0   2
3   5   5   5

I tried using the method sub with fill_value=5 but than in both rows 2 and 3 I'll get 0.

3
I understand why you want the second row to be as if df2 had a second row with 0, but what is the rational behind the 5s in the third row? (if it has nothing to do with the original values, you can just assign to that row after .sub) - Adam.Er8

3 Answers

3
votes

One way would be to reindex df2 setting fill_value to 0 before subtracting, then subtract and fillna with 5:

ix = pd.RangeIndex((df1.index|df2.index).max()+1)
df1.sub(df2.reindex(ix, fill_value=0)).fillna(5).astype(df1.dtypes)

   A  B  C
0  0  0  0
1 -1 -2  4
2  4  0  2
3  5  5  5
1
votes

We have to reindex here to get alligned indices. This way we can use the sub method.

idxmin = df2.index.min()
idxmax = df2.index.max()

idx = np.arange(idxmin, idxmax+1)

df1.reindex(idx).sub(df2.reindex(idx).fillna(0)).fillna(5)
     A    B    C
0  0.0  0.0  0.0
1 -1.0 -2.0  4.0
2  4.0  0.0  2.0
3  5.0  5.0  5.0
0
votes

I found the combine_first method that almost satisfies my needs:

df2.combine_first(df1).sub(df2, fill_value=0)

but still produces only:

    A   B   C
0   0   0   0
1   0   0   0
2   4   0   2
3   0   0   0