Mean of non zero values in sparse matrix?

Question

I'm trying to calculate the mean of non-zero values in each row of a sparse row matrix. Using the matrix's mean method doesn't do it:

>>> from scipy.sparse import csr_matrix
>>> a = csr_matrix([[0, 0, 2], [1, 3, 8]])
>>> a.mean(axis=1)
matrix([[ 0.66666667],
        [ 4.        ]])

The following works but is slow for large matrices:

>>> import numpy as np
>>> b = np.zeros(a.shape[0])
>>> for i in range(a.shape[0]):
...    b[i] = a.getrow(i).data.mean()
... 
>>> b
array([ 2.,  4.])

Could anyone please tell me if there is a faster method?

Antonio Ragagnin Antonio Ragagnin · Accepted Answer · 2015-12-14T12:59:21

This seems the typical problem where you can use numpy.bincount. For this I made use of three functions:

(x,y,z)=scipy.sparse.find(a)

returns rows(x),columns(y) and values(z) of the sparse matrix. For instace, x is array([0, 1, 1, 1].

numpy.bincount(x) returns, for each row number, how meny nonzero elemnts you have.

numpy.bincount(x,wights=z) returns, for each row , the sums of non-zero elements.

A final working code:

from scipy.sparse import csr_matrix
a = csr_matrix([[0, 0, 2], [1, 3, 8]])

import numpy
import scipy.sparse
(x,y,z)=scipy.sparse.find(a)
countings=numpy.bincount(x)
sums=numpy.bincount(x,weights=z)
averages=sums/countings

print(averages)

returns:

[ 2.  4.]

Mean of non zero values in sparse matrix?

4 Answers