How can I compare two columns and print a new column containing either of the values (of the two columns) respectively according to a condition?

Question

I am trying to compare the values of two columns by calculating the difference; however, I want to create a new column having the value based on a condition.

Suppose I have data like this,

Column1 | Column2 | 
   10   |    9    |
   20   |    40   |
   30   |    30   |

I want to create a data like,

Column1 | Column2 | Column3
   10   |    9    |    9
   20   |    40   |   20
   30   |    30   |   30

The first row of Column3 is 9 because the Column2 - Column1 < 0.0 ,therefore, Column3 will have the same value of Column2. Second row is 20 because the difference is positive (Column2 - Column1 > 0.0),therefore, Column3 will have the same value of Column1. Third row, the difference is 0 so Column3 can either value of Column1 or Column2.

I am trying to implement it using pandas and I have written the following code but I get an error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here's my code:

df = pd.read_excel('abc.xlsx')

if( df['Column2'] - df['Column1'] < 0.0 ):
    df['Column3'] = df['Column2']

elif (df['Column2'] - df['Column1'] > 0.0) :
    df['Column3'] = df['Column1']

else:
    df['Column3'] = df['Column1']

print(df)

Your code checks for each row to see which conditions are met, however, since it is working on a series, it cant tell what is True/False, ... hence the ambiguity. Using, a vectorized approach however, Pandas can correctly interprete the True/False results and assign the right values accordingly. — sammywemmy
thanks very much for the explanation. May I ask why is it necessary for pandas to be able to tell what is True/False...it is just supposed to check the difference of two numbers and check whether it is greater/less/equal to 0 followed by operation. I am sorry if this is a dumb question, I am a noob. — silentwraith
when working within python, the if function is executed on a line by line basis; in Pandas/Numpy, vectorization implies working on chunks/the entire column in one go. as such, checking row by row(which you can certainly do with iterrows) wont be efficient. I'm hoping my explanation did not make it worse. Maybe this link can help — sammywemmy
wow, so your answer is focussed on efficiency too. I think I understand the usage a lot more now. Thanks again. Cheers. — silentwraith

sammywemmy sammywemmy · Accepted Answer · 2020-07-05T05:44:23

You can compare the two columns, then use numpy where to run a vectorised if/else :

 df["Column3"] = np.where(df.Column1.gt(df.Column2), df.Column2, df.Column1)

How can I compare two columns and print a new column containing either of the values (of the two columns) respectively according to a condition?

2 Answers