Python How to find average of columns using dataframes apply method

Question

This is a question on Udacity Data Science Nanodegree and I can't figure it out. The instructions are:

Using the dataframe's apply method, create a new Series called avg_medal_count that indicates the average number of gold, silver, and bronze medals earned amongst countries who earned at least one medal of any kind at the 2014 Sochi Olympics.

The code I have currently is:

import numpy
from pandas import DataFrame, Series

def avg_medal_count():

 countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
                 'Netherlands', 'Germany', 'Switzerland', 'Belarus',
                 'Austria', 'France', 'Poland', 'China', 'Korea', 
                 'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
                 'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
                 'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']

    gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
    silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
    bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]

    olympic_medal_counts = {'country_name':countries,
                            'gold': Series(gold),
                            'silver': Series(silver),
                            'bronze': Series(bronze)}    
    df = DataFrame(olympic_medal_counts)


# YOUR CODE HERE


    return avg_medal_count

I have tried a couple different things such as:

avg_medal_count = df.apply(numpy.mean), but get the error saying it could not convert the first column to numeric which makes sense because the first column is a list of countries. How can I use df.apply on only gold, silver and bronze columns? I have tried other variations, but nothing worked. I am pretty sure that I need to use a combination of df.apply and numpy.mean, because that is what I just learned about. Any thoughts?

Thanks!

Well what are the restrictions here? I mean you could just do df[['gold','silver','bronze']].apply(np.mean)) or you could filter the column selection by dtype, the other thing is that you need to also filter the df such that at least one medal of each type has been earned also as this is an exercise for the reader you should tackle this one problem at a time — EdChum

Andrew Andrew · Accepted Answer · 2015-04-07T21:21:48

I would first modify how you import the data to:

df = DataFrame(olympic_medal_counts).set_index('country_name')

I would then calculate a new column containing the sum of the rows for the toal number of medals per country.

df['medal total'] = df.sum(axis=1)

Results:

                   bronze  gold  silver  medal total
country_name                                     
Russian Fed.         9    13      11           33
Norway              10    11       5           26
Canada               5    10      10           25
United States       12     9       7           28
Netherlands          9     8       7           24
Germany              5     8       6           19
Switzerland          2     6       3           11
Belarus              1     5       0            6
Austria              5     4       8           17
France               7     4       4           15
Poland               1     4       1            6
China                2     3       4            9
Korea                2     3       3            8
Sweden               6     2       7           15
Czech Republic       2     2       4            8
Slovenia             4     2       2            8
Japan                3     1       4            8
Finland              1     1       3            5
Great Britain        2     1       1            4
Ukraine              1     1       0            2
Slovakia             0     1       0            1
Italy                6     0       2            8
Latvia               2     0       2            4
Australia            1     0       2            3
Croatia              0     0       1            1
Kazakhstan           1     0       0            1

Finally, subset the the DataFrame for rows with medal totals greater than or equal to 1 and find the average of the columns.

df[df['medal total'] >= 1].apply(np.mean)

Results:

bronze          3.807692
gold            3.807692
silver          3.730769
medal total    11.346154

This could also be accomplished in one line using:

df[ df.sum(axis=1) >= 1 ].apply(np.mean)

Python How to find average of columns using dataframes apply method

5 Answers