0
votes

I have simpe dataset in a dataframe in which year, attendance, weeks are the column.

attendance    week      years
37440 Sun 2010-04-04 43504 Mon 2010-04-05 38935 Mon 2010-04-05 40052 Mon 2010-04-05 43510 Tue 2010-04-06 38000 Tue 2010-04-06 10090 Tue 2010-04-06 41533 Wed 2010-04-07

i would like to plot a scatter,i have many values of attancdance against each day i would like to average them and show them on scatter plot.
i saw this on other post and i tried but it gave an error here is my code

import pandas as pd

days=['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun']
log_2010=pd.read_excel('GL2010-2017.xlsx')

year=log_2010['years']
attendance=log_2010['attendace']
week=log_2010['day_of_week']
df=pd.DataFrame({
    'years':year,
       'attendance':attendance,
       'week':week
    })
new_df=df.dropna(how='any')
new_df['years']=pd.to_datetime(year,format='%Y%m%d')
df['week'] = pd.Categorical(new_df['week'], categories=days)


df[['week', 'attendance']].groupby('week').mean().plot.scatter(df['week'],df['attendance'])

i get this error
KeyError: "['Sun' 'Mon' 'Mon' ... 'Sun' 'Sun' 'Sun'] not in index"

1
Please include the complete error message.DYZ
this is the complete error message.Muhammad Ahmed

1 Answers

-1
votes

Try to set up week as index when you import data from excel file: log_2010=pd.read_excel('GL2010-2017.xlsx', index_col='week')