I have simpe dataset in a dataframe in which year, attendance, weeks are the column.
attendance week years
37440 Sun 2010-04-04 43504 Mon 2010-04-05 38935 Mon 2010-04-05 40052 Mon 2010-04-05 43510 Tue 2010-04-06 38000 Tue 2010-04-06 10090 Tue 2010-04-06 41533 Wed 2010-04-07
i would like to plot a scatter,i have many values of attancdance against each day i would like to average them and show them on scatter plot.
i saw this on other post and i tried but it gave an error here is my code
import pandas as pd
days=['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun']
log_2010=pd.read_excel('GL2010-2017.xlsx')
year=log_2010['years']
attendance=log_2010['attendace']
week=log_2010['day_of_week']
df=pd.DataFrame({
'years':year,
'attendance':attendance,
'week':week
})
new_df=df.dropna(how='any')
new_df['years']=pd.to_datetime(year,format='%Y%m%d')
df['week'] = pd.Categorical(new_df['week'], categories=days)
df[['week', 'attendance']].groupby('week').mean().plot.scatter(df['week'],df['attendance'])
i get this error
KeyError: "['Sun' 'Mon' 'Mon' ... 'Sun' 'Sun' 'Sun'] not in index"