How do I get the row count of a Pandas DataFrame?
This table summarises the different situations in which you'd want to count something in a DataFrame (or Series, for completeness), along with the recommended method(s).
Footnotes
DataFrame.count
returns counts for each column as a Series
since the non-null count varies by column.
DataFrameGroupBy.size
returns a Series
, since all columns in the same group share the same row-count.
DataFrameGroupBy.count
returns a DataFrame
, since the non-null count could differ across columns in the same group. To get the group-wise non-null count for a specific column, use df.groupby(...)['x'].count()
where "x" is the column to count.
#Minimal Code Examples
Below, I show examples of each of the methods described in the table above. First, the setup -
df = pd.DataFrame({
'A': list('aabbc'), 'B': ['x', 'x', np.nan, 'x', np.nan]})
s = df['B'].copy()
df
A B
0 a x
1 a x
2 b NaN
3 b x
4 c NaN
s
0 x
1 x
2 NaN
3 x
4 NaN
Name: B, dtype: object
Row Count of a DataFrame: len(df)
, df.shape[0]
, or len(df.index)
len(df)
# 5
df.shape[0]
# 5
len(df.index)
# 5
It seems silly to compare the performance of constant time operations, especially when the difference is on the level of "seriously, don't worry about it". But this seems to be a trend with other answers, so I'm doing the same for completeness.
Of the three methods above, len(df.index)
(as mentioned in other answers) is the fastest.
Note
- All the methods above are constant time operations as they are simple attribute lookups.
df.shape
(similar to ndarray.shape
) is an attribute that returns a tuple of (# Rows, # Cols)
. For example, df.shape
returns (8, 2)
for the example here.
Column Count of a DataFrame: df.shape[1]
, len(df.columns)
df.shape[1]
# 2
len(df.columns)
# 2
Analogous to len(df.index)
, len(df.columns)
is the faster of the two methods (but takes more characters to type).
Row Count of a Series: len(s)
, s.size
, len(s.index)
len(s)
# 5
s.size
# 5
len(s.index)
# 5
s.size
and len(s.index)
are about the same in terms of speed. But I recommend len(df)
.
Note
size
is an attribute, and it returns the number of elements (=count
of rows for any Series). DataFrames also define a size attribute which
returns the same result as df.shape[0] * df.shape[1]
.
Non-Null Row Count: DataFrame.count
and Series.count
The methods described here only count non-null values (meaning NaNs are ignored).
Calling DataFrame.count
will return non-NaN counts for each column:
df.count()
A 5
B 3
dtype: int64
For Series, use Series.count
to similar effect:
s.count()
# 3
Group-wise Row Count: GroupBy.size
For DataFrames
, use DataFrameGroupBy.size
to count the number of rows per group.
df.groupby('A').size()
A
a 2
b 2
c 1
dtype: int64
Similarly, for Series
, you'll use SeriesGroupBy.size
.
s.groupby(df.A).size()
A
a 2
b 2
c 1
Name: B, dtype: int64
In both cases, a Series
is returned. This makes sense for DataFrames
as well since all groups share the same row-count.
Group-wise Non-Null Row Count: GroupBy.count
Similar to above, but use GroupBy.count
, not GroupBy.size
. Note that size
always returns a Series
, while count
returns a Series
if called on a specific column, or else a DataFrame
.
The following methods return the same thing:
df.groupby('A')['B'].size()
df.groupby('A').size()
A
a 2
b 2
c 1
Name: B, dtype: int64
Meanwhile, for count
, we have
df.groupby('A').count()
B
A
a 2
b 1
c 0
...called on the entire GroupBy object, vs.,
df.groupby('A')['B'].count()
A
a 2
b 1
c 0
Name: B, dtype: int64
Called on a specific column.
df.count()
will only return the count of non-NA/NaN rows for each column. You should usedf.shape[0]
instead, which will always correctly tell you the number of rows. – smci