In [1]: from scipy import sparse
In [2]: M = sparse.random(10,10,.2, 'csr')
In [3]: M
Out[3]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
In [4]: M.astype(bool)
Out[4]:
<10x10 sparse matrix of type '<class 'numpy.bool_'>'
with 20 stored elements in Compressed Sparse Row format>
In [6]: M.astype(bool).sum(axis=0)
Out[6]: matrix([[0, 3, 4, 3, 1, 3, 1, 0, 2, 3]], dtype=int64)
Compare that with the array - converted to 0/1 integers
In [7]: M.astype(bool).astype(int).A
Out[7]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0]])
Check the total against the matrix nnz:
In [8]: M.astype(bool).sum(axis=0).sum()
Out[8]: 20
With axis=0
, the sum is across rows, one value per column. For sum across columns (one value per row), use axis=1)
:
In [13]: M.astype(bool).sum(axis=1)
Out[13]:
matrix([[0],
[4],
[2],
[2],
[3],
[1],
[4],
[1],
[1],
[2]])
This is a (n,1) dense matrix. You can use A1
to make a 1d array: M.astype(bool).sum(axis=1).A1
The distinction is easier to see when the matrix isn't square.
count_nonzero
can do the same with the dense array (but not the sparse one):
In [15]: np.count_nonzero(M.A,axis=1)
Out[15]: array([0, 4, 2, 2, 3, 1, 4, 1, 1, 2])
With @fuglede's
indptr
approach:
In [18]: np.diff(M.indptr)
Out[18]: array([0, 4, 2, 2, 3, 1, 4, 1, 1, 2], dtype=int32)
for
s are swapped). – fuglede