python - Why isn't my Pandas 'apply' function referencing multiple columns working?

Question

I have some problems with the Pandas apply function, when using multiple columns with the following dataframe

df = DataFrame ({'a' : np.random.randn(6),
                 'b' : ['foo', 'bar'] * 3,
                 'c' : np.random.randn(6)})

and the following function

def my_test(a, b):
    return a % b

When I try to apply this function with :

df['Value'] = df.apply(lambda row: my_test(row[a], row[c]), axis=1)

I get the error message:

NameError: ("global name 'a' is not defined", u'occurred at index 0')

I do not understand this message, I defined the name properly.

I would highly appreciate any help on this issue

Update

Thanks for your help. I made indeed some syntax mistakes with the code, the index should be put ''. However I still get the same issue using a more complex function such as:

def my_test(a):
    cum_diff = 0
    for ix in df.index():
        cum_diff = cum_diff + (a - df['a'][ix])
    return cum_diff

Avoid using apply as much as possible. If you're not sure you need to use it, you probably don't. I recommend taking a look at When should I ever want to use pandas apply() in my code?. — cs95
This is just about syntax errors referencing a dataframe column, and why do functions need arguments. As to your second question, the function my_test(a) doesn't know what df is since it wasn't passed in as an argument (unless df is supposed to be a global, which would be terrible practice). You need to pass all the values you'll need inside a function as arguments (preferably in order), otherwise how else would the function know where df comes from? Also, it's bad practice to program in a namespace littered with global variables, you won't catch errors like this. — smci

waitingkuo waitingkuo · Accepted Answer · 2013-05-03T08:40:31

Seems you forgot the '' of your string.

In [43]: df['Value'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1)

In [44]: df
Out[44]:
                    a    b         c     Value
          0 -1.674308  foo  0.343801  0.044698
          1 -2.163236  bar -2.046438 -0.116798
          2 -0.199115  foo -0.458050 -0.199115
          3  0.918646  bar -0.007185 -0.001006
          4  1.336830  foo  0.534292  0.268245
          5  0.976844  bar -0.773630 -0.570417

BTW, in my opinion, following way is more elegant:

In [53]: def my_test2(row):
....:     return row['a'] % row['c']
....:     

In [54]: df['Value'] = df.apply(my_test2, axis=1)

python - Why isn't my Pandas 'apply' function referencing multiple columns working?

6 Answers