I am trying to compare the values of two columns by calculating the difference; however, I want to create a new column having the value based on a condition.
Suppose I have data like this,
Column1 | Column2 |
10 | 9 |
20 | 40 |
30 | 30 |
I want to create a data like,
Column1 | Column2 | Column3
10 | 9 | 9
20 | 40 | 20
30 | 30 | 30
The first row of Column3 is 9 because the Column2 - Column1 < 0.0 ,therefore, Column3 will have the same value of Column2. Second row is 20 because the difference is positive (Column2 - Column1 > 0.0),therefore, Column3 will have the same value of Column1. Third row, the difference is 0 so Column3 can either value of Column1 or Column2.
I am trying to implement it using pandas and I have written the following code but I get an error:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Here's my code:
df = pd.read_excel('abc.xlsx')
if( df['Column2'] - df['Column1'] < 0.0 ):
df['Column3'] = df['Column2']
elif (df['Column2'] - df['Column1'] > 0.0) :
df['Column3'] = df['Column1']
else:
df['Column3'] = df['Column1']
print(df)
if
function is executed on a line by line basis; in Pandas/Numpy, vectorization implies working on chunks/the entire column in one go. as such, checking row by row(which you can certainly do with iterrows) wont be efficient. I'm hoping my explanation did not make it worse. Maybe this link can help – sammywemmy