1
votes

I have a data frame like this,

 df
 col1    col2    col3    col4
  1       3       4       2
  4       6       7       7
  3       6       3       3

Now I want to find out how many values are less than col4 values and store the count in another column, for example in first row there are 1 value less than 2 so the new value will be 1, the final data frame should look like,

col1    col2    col3    col4    col5
  1       3       4       2       1
  4       6       7       7       2
  3       6       3       3       0

I could do this using a for loop by comparing the row values, but the execution time will be huge. I am looking for any pandas shortcut or any other method to do it most efficiently.

4

4 Answers

1
votes

We can run the computation within numpy, using numpy less and newaxis, compare each row, sum the boolean result, and assign to col5 :

df['col5'] = np.sum(np.less(df.to_numpy()[:,:-1],
                            df.col4.to_numpy()[:,np.newaxis]),
                    axis=1)

df

  col1  col2    col3    col4    col5
0   1   3        4        2     1
1   4   6        7        7     2
2   3   6        3        3     0
1
votes

Do with lt and pass axis=0

df['new']=df.lt(df.col4,axis=0).sum(1)
df
   col1  col2  col3  col4  new
0     1     3     4     2    1
1     4     6     7     7    2
2     3     6     3     3    0
1
votes

Use df.lt and sum

df[['col1','col2','col3']].lt(df.col4.tolist(), axis=0).sum(1)

Out[76]:
0    1
1    2
2    0
dtype: int64
1
votes

We can make use of broadcasting by making use of DataFrame.lt over axis=0:

df['col5'] = df.loc[:, :'col3'].lt(df['col4'], axis=0).sum(axis=1)

   col1  col2  col3  col4  col5
0     1     3     4     2     1
1     4     6     7     7     2
2     3     6     3     3     0

Or you can make explicit use of numpy broadcasting:

df['col5'] = np.sum(df.loc[:, :'col3'] < df['col4'][:, None], axis=1)