9
votes

I've got the following code and it works. This basically renames values in columns so that they can be later merged.

pop = pd.read_csv('population.csv')
pop_recent = pop[pop['Year'] == 2014]

mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
}
f= lambda x: mapping.get(x, x)
pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

Warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

I did google this! But no examples seem to be using map, so I'm at a loss...

2
How did you get pop_recent ? - Anand S Kumar
Try what the warning is telling to - pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f) - Anand S Kumar
Isn't this unnecessarily complex? I've already filtered out values other than 2014. - Mike
That filtering is the issue. Check out my answer. - Anand S Kumar

2 Answers

15
votes

The issue is with chained indexing , what you are actually trying to do is to set values to - pop[pop['Year'] == 2014]['Country Name'] - this would not work most of the times (as explained very well in the linked documentation) as this is two different calls and one of the calls may return a copy of the dataframe (I believe the boolean indexing) is returning the copy of the dataframe).

Hence, when you try to set values to that copy, it does not reflect in the original dataframe. Example -

In [6]: df
Out[6]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [7]: df[df['A']==1]['B'] = 10
/path/to/ipython-script.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

In [8]: df
Out[8]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

As noted , instead of chained indexing you should use DataFrame.loc to index the rows as well as the columns to update in a single call, avoiding this error. Example -

pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f)

Or if this seem too long to you, you can create a mask (boolean dataframe) beforehand and assign to a variable, and use that in the above statement. Example -

mask = pop['year'] == 2014
pop.loc[mask,'Country Name'] = pop.loc[mask,'Country Name'].map(f)

Demo -

In [9]: df
Out[9]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [10]: mapping = { 1:2 , 3:4}

In [11]: f= lambda x: mapping.get(x, x)

In [12]: df.loc[(df['B']==2),'A'] = df.loc[(df['B']==2),'A'].map(f)

In [13]: df
Out[13]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

Demo with the mask method -

In [18]: df
Out[18]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [19]: mask = df['B']==2

In [20]: df.loc[mask,'A'] = df.loc[mask,'A'].map(f)

In [21]: df
Out[21]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9
-1
votes

I recommend you to reset indices in pop_recent = pop[pop['Year'] == 2014].

If you want to apply some function to some column of dataframe, try to use function apply function of DataFrame API. Simple demo:

 mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
 }
 df = pandas.DataFrame({'Country':['Korea, Rep.', 'Taiwan, China', 'Japan', 'USA'], 'date':[2014, 2014, 2015, 2014]})
 df_recent = df[df['date'] == 2014].reset_index()
 df_recent['Country'] = df_recent['Country'].apply(lambda x: mapping.get(x, x))

Output:

>>> df_recent
index      Country  date
0      0  South Korea  2014
1      1       Taiwan  2014
2      3          USA  2014