9
votes

I have a pandas lookup table which looks like this

Grade   Lower_Boundary  Upper_Boundary
1   -110    -96
2   -96 -91
3   -91 -85
4   -85 -81
5   -81 -77
6   -77 -72
7   -72 -68
8   -68 -63
9   -63 -58
10  -58 -54
11  -54 -50
12  -50 -46
13  -46 -42
14  -42 -38
15  -38 -34
16  -34 -28
17  -28 -18
18  -18 -11
19  -11 -11
20  -11 -9

I have another pandas dataframe that looks contains score. I want to assign 'Grade' to the score column, by looking up the look up table. So based on which interval of lower and upper boundary the score falls, the grade should be assigned from that row in the lookup table. Is there a way to do it without typing a bunch of if then else statements? I am thinking just of excel's index match.

Score   Grade
-75 6
-75 6
-60 9
-66 8
-66 8
-98 1
-60 9
-82 4
-70 7
-60 9
-60 9
-60 9
-56 10
-70 7
-70 7
-70 7
-66 8
-56 10
-66 8
-66 8
1
Iterate over the table, check whether the score is within the threshold, if so, you know your grade, otherwise continue - Sebastian Hoffmann
If you can, please you share the code? - Zenvega
I'm not familiar with pandas, but it would utterly suprise me, if it doesn't provide a way to iterate over its data structures. - Sebastian Hoffmann
I know how to iterate over the score table but using if then else conditions for the grade lookup. I was wondering if there was a more efficient way of coding that would reduce the number of lines of code. If you are familiar with excel index match, that is what I am looking for.. - Zenvega
what columns contains your second DF: Score and Grade? - MaxU

1 Answers

11
votes

A one-line solution (I call your lookup table lookup):

df['Score'].apply(lambda score: lookup['Grade'][(lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)].values[0])

Explanation:

For a given score, here is how to find the grade:

score = -75
match = (lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)
grade = lookup['Grade'][match]

This return a series of length 1. You can get its value with, for instance:

grade.values[0]

All you need to do is apply the above to the score column. If you want a one-liner, use a lambda function:

df['Score'].apply(lambda score: lookup['Grade'][(lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)].values[0])

Otherwise the following would be more readable:

def lookup_grade(score):
    match = (lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)
    grade = lookup['Grade'][match]
    return grade.values[0]

df['Score'].apply(lookup_grade)

This approach would also make it easier to deal with cases when no match is found.