1
votes

I'm attempting to run an if statement to match the country of origin of marathon winners to theirs countrie's gdp data. I am getting the error 'Can only compare identically-labeled Series objects'.

if df['Winner Country'] ==  gdp_data['Country']:

    if df['YEAR'] == 1970 :

        df['gdp'] = gdp_data['1970 gdp/cap'] 

gdp_data example:

Country 1970 gdp/cap    
Kenya   98  

df example:

YEAR    Winner_Name Winner_Country  Time    Gender  
1977    Dan Cloeter USA             2:17:52 M   

I intend to assign a gdp value to df based off both country and year(I only included partial data, there are extra columns for each year in the gdp_data datarame).

If I opt to merge I run into this issue:

data example:

YEAR    Winner_Name    Winner_Country   Time    Gender  Marathon_City   Country 1970    1971     
1977    Dan Cloeter    USA              2:17:52 M       Chicago         USA     5247.0  5687.0  
1978    Mark Stanforth USA              2:19:20 M       Chicago         USA     5247.0  5687.0

as seen the number 1970 is a variable but is also a possible result for year. How can I create a gdp variable based the year the race occurred?

What I initially tried:

YEAR = df_gdp['YEAR']
df_gdp['gdp'] = df[YEAR]

resulting in this error

KeyError: "None of [Int64Index([1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,\n ...\n 2009, 2010, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],\n dtype='int64', length=258)] are in the [columns]"

a simplified example of desired results

Take this example data set

letter a b c d
a      1 3 4 2  
b      4 3 2 1 
c      2 1 4 3
d      3 4 2 1

desired results

letter a b c d  correct answer
a      1 3 4 2  1  
b      4 3 2 1  3 
c      2 1 4 3  4
d      3 4 2 1  1

how to create the 'correct answer' column?

1
df['Winner_Country'] == gdp_data['Country'] will return a pandas Series of True and False values, so you wouldn't do this iteratively. Can you give more of an explanation of what you're trying to achieve? Are you trying to join DataFrames on their country? - A Poor

1 Answers

0
votes

I am not quite sure what you are asking but i think you are trying to create a gdp column that matches with the year column.

If that is the case i think this should work.

df_gdp['gdp'] = df_gdp.apply(lambda x: x.loc[(x['YEAR'])], axis=1)



Here is how i tested it.

##create test data
import numpy as np
test = pd.DataFrame(np.random.randint(1000,10000,(20,20)),columns = np.arange(1970,1990))
test['YEAR'] = np.arange(1970,1990)
test['gdp'] = test.apply(lambda x: x.loc[(x['YEAR'])],axis=1)
print(test[[1970,1971,1972,1973,1974,'YEAR','gdp']].head())

   1970  1971  1972  1973  1974  YEAR   gdp
0  4436  1288  5956  5861  2361  1970  4436
1  8918  5311  9889  2356  4646  1971  5311
2  1129  2582  6304  8488  3783  1972  6304
3  3767  8178  3947  3098  9508  1973  3098
4  7710  7713  5186  3894  9692  1974  9692