Make a sample:
In [51]: A=np.array([[1,2,0,0],[0,0,0,0],[2,0,3,0]])
In [52]: M=sparse.csr_matrix(A)
In lil
format, values for each row are stored in a list.
In [56]: Ml=M.tolil()
In [57]: Ml.data
Out[57]: array([[1, 2], [], [2, 3]], dtype=object)
Take the product of each of those:
In [58]: np.array([np.prod(i) for i in Ml.data])
Out[58]: array([ 2., 1., 6.])
In csr
format values are stored as:
In [53]: M.data
Out[53]: array([1, 2, 2, 3], dtype=int32)
In [54]: M.indices
Out[54]: array([0, 1, 0, 2], dtype=int32)
In [55]: M.indptr
Out[55]: array([0, 2, 2, 4], dtype=int32)
indptr
gives the start of the row values. Calculation code on csr
(and csc
) matrices routinely perform calculations like this (but compiled):
In [94]: lst=[]; i=M.indptr[0]
In [95]: for j in M.indptr[1:]:
...: lst.append(np.product(M.data[i:j]))
...: i = j
In [96]: lst
Out[96]: [2, 1, 6]
With Diavaker's test matrix:
In [137]: M.A
Out[137]:
array([[-1, 2, 0, 0],
[ 0, 0, 0, 0],
[ 2, 0, 3, 0],
[ 4, 5, 6, 0],
[ 1, 9, 0, 2]], dtype=int32)
the above loop produces:
In [138]: foo(M)
Out[138]: [-2, 1, 6, 120, 18]
Divakar's code with unique
and reduceat
In [139]: divk(M)
Out[139]: array([ -2, 0, 6, 120, 18], dtype=int32)
(different values of the empty row).
Reduceat with indptr
is simply:
In [140]: np.multiply.reduceat(M.data,M.indptr[:-1])
Out[140]: array([ -2, 2, 6, 120, 18], dtype=int32)
The value for the empty 2nd line needs to be fixed (with indptr
values of [2,2,...], reduceat
uses M.data[2]
).
def wptr(M, empty_val=1):
res = np.multiply.reduceat(M.data, M.indptr[:-1])
mask = np.diff(M.indptr)==0
res[mask] = empty_val
return res
With a larger matrix
Mb=sparse.random(1000,1000,.1,format='csr')
this wptr
is about 30x faster than Divaker's version.
More discussion on calculating values across rows of a sparse matrix:
Scipy.sparse.csr_matrix: How to get top ten values and indices?