64
votes

I want to create a count of unique values from one of my Pandas dataframe columns and then add a new column with those counts to my original data frame. I've tried a couple different things. I created a pandas series and then calculated counts with the value_counts method. I tried to merge these values back to my original dataframe, but I the keys that I want to merge on are in the Index(ix/loc).

Color Value
Red   100
Red   150
Blue  50

I want to return something like:

Color Value Counts
Red   100   2
Red   150   2 
Blue  50    1
5
This is popular question lately. See this question here which is almost identical to your situation. - bdiamante

5 Answers

68
votes
df['Counts'] = df.groupby(['Color'])['Value'].transform('count')

For example,

In [102]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})

In [103]: df
Out[103]: 
  Color  Value
0   Red    100
1   Red    150
2  Blue     50

In [104]: df['Counts'] = df.groupby(['Color'])['Value'].transform('count')

In [105]: df
Out[105]: 
  Color  Value  Counts
0   Red    100       2
1   Red    150       2
2  Blue     50       1

Note that transform('count') ignores NaNs. If you want to count NaNs, use transform(len).


To the anonymous editor: If you are getting an error while using transform('count') it may be due to your version of Pandas being too old. The above works with pandas version 0.15 or newer.

17
votes

One other option:

z = df['Color'].value_counts 

z1 = z.to_dict() #converts to dictionary

df['Count_Column'] = df['Color'].map(z1) 

This option will give you a column with repeated values of the counts, corresponding to the frequency of each value in the 'Color' column.

4
votes

df['Counts'] = df.Color.groupby(df.Color).transform('count')

You can do this with any series: group it by itself and call transform('count'):

>>> series = pd.Series(['Red', 'Red', 'Blue'])
>>> series.groupby(series).transform('count')
0    2
1    2
2    1
dtype: int64
4
votes

This answer uses Series.map with Series.value_counts. It was tested with Pandas 1.1.

df['counts'] = df['attribute'].map(df['attribute'].value_counts())

Credit: comment by sacuL

3
votes

My initial thought would be to use list comprehension as shown below but, as was pointed out in the comment, this is slower than the groupby and transform method. I will leave this answer to demonstrate WHAT NOT TO DO:

In [94]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})
In [95]: df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
In [96]: df
Out[100]: 
  Color  Value  Counts
0   Red    100       2
1   Red    150       2
2  Blue     50       1

[3 rows x 3 columns]

@unutbu's method gets complicated for DataFrames with several columns which make this simpler to code. If you are working with a small data frame, this is faster (see below), but otherwise, you should use NOT use this.

In [97]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = df.groupby(['Color']).transform('count')
100 loops, best of 3: 2.87 ms per loop
In [98]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
1000 loops, best of 3: 1.03 ms per loop