0
votes

I need to add a new column in my dataframe based in a condition from column 3. I receive the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() Much appreciated any help! My file is as follows:

2017;Jan;TURKRETAIL;7000007;[ICP None];ClosingBal_Input;Cost;Retail;LocalGAAPInput;C; 14,947.00 
2017;Jan;TURKRETAIL;5103001;[ICP None];ClosingBal_Input;BS_Input;Retail;LocalGAAPInput;C; 90,798.00 
2017;Jan;TURKRETAIL;7000002;[ICP None];ClosingBal_Input;BS_Input;Retail;LocalGAAPInput;D; 8,500.00 
2017;Jan;TURKRETAIL;2769601;[ICP None];NOPRODUCT;Operations;0;LocalGAAPInput;D; 684.00

And the Code:

import pandas as pd
file_to_open=("C:\\Users\\yannis\\py_script\\py2\\tst1.txt")
file_output=("C:\\Users\\yannis\\py_script\\py2\\tst2.txt")

df = pd.read_csv(file_to_open,sep=";",encoding="utf8",header=None)

df[3] = df[3].astype(str).str.strip()

def f(self):
    if  df[3].str.startswith('2',na=False):
        val = 'Mvmts_NetIncome'
    elif df[3].str.startswith('5',na=False):
        val = 'Other'
        return val
    df[11] = df.apply(f,axis=1)

df.to_csv (file_output,sep=";",encoding="utf8",header=None)
print(df)
2

2 Answers

0
votes

try this

import pandas as pd
file_to_open=("C:\\Users\\yannis\\py_script\\py2\\tst1.txt")
file_output=("C:\\Users\\yannis\\py_script\\py2\\tst2.txt")

df = pd.read_csv(file_to_open,sep=";",encoding="utf8",header=None)

df[3] = df[3].astype(str).str.strip()

def f(row): #give an argument to f
    if  row[3].str.startswith('2',na=False): #use the argument as row name not df
        val = 'Mvmts_NetIncome'
    elif row[3].str.startswith('5',na=False):
        val = 'Other'
        return val

df[11] = df.apply(f,axis=1) #this shouldn't be inside the function

df.to_csv (file_output,sep=";",encoding="utf8",header=None)
print(df)

0
votes

If you want to apply a function to each row, then the function must receive the row series, but each element of the row_serie[3] is a string, so there is no need to user str method:

def f(row_serie):
    val = ''
    if  row_serie[3].startswith('2'): 
         val = 'Mvmts_NetIncome'
    elif row_serie[3].startswith('5'):
        val = 'Other'
    return val

df[11] = df.apply(f,axis=1)

Consider adding a default value in case the column 3 does not start with 2 nor 5.