0
votes

I am using the data found here: Kaggle NFL Data. I am attempting to filter the data based on the number of pass attempts per player. Reading in all data to variable all_nfl_data. I then would like to do this:

all_pass_plays = all_nfl_data[all_nfl_data.PlayType == 'Pass']
passers_under_100 = all_pass_plays.groupby('Passer').transform('size') <= 100

I cannot figure out how to correctly filter based on the above logic. I am trying to filter for players which have less than 100 pass attempts in total. The goal is to filter the full dataframe based on this number, not just return the player names themselves. Appreciate the help :)

2

2 Answers

1
votes

You can do with isin (PS: trying to fix your code)

all_pass_plays = all_nfl_data[all_nfl_data.PlayType == 'Pass']
passers_under_100 = all_pass_plays.groupby('Passer').size()<= 100
afterfilterdf=all_nfl_data[all_nfl_data['Passer'].isin(passers_under_100[passers_under_100].index)]
1
votes

Alternative solution in one line

passers_under_100 = all_pass_plays.groupby('Passer').filter(lambda x : x['Passer'].size <= 100)

Corresponding documentation : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.filter.html