2
votes

I have a sparse csc matrix with many zero elements for which I would like to compute the product of all column elements for each row.

i.e.:

 A = [[1,2,0,0],
      [2,0,3,0]]

should be converted to:

V = [[2,
      6]]

Using a numpy dense matrix this can be accomplished by replacing all zero values with one values and using A.prod(1). This is however not a option since the dense matrix would be too large.

Is there any way to accomplish this without converting the sparse matrix into a dense one?

3

3 Answers

2
votes

Approach #1: We can use the row indices of the sparse elements as IDs and perform multiplication of the corresponding values of those elements with np.multiply.reduceat to get the desired output.

Thus, an implementation would be -

from scipy import sparse
from scipy.sparse import csc_matrix

r,c,v = sparse.find(a) # a is input sparse matrix
out = np.zeros(a.shape[0],dtype=a.dtype)
unqr, shift_idx = np.unique(r,return_index=1)
out[unqr] = np.multiply.reduceat(v, shift_idx)

Sample run -

In [89]: # Let's create a sample csc_matrix
    ...: A = np.array([[-1,2,0,0],[0,0,0,0],[2,0,3,0],[4,5,6,0],[1,9,0,2]])
    ...: a = csc_matrix(A)
    ...: 

In [90]: a
Out[90]: 
<5x4 sparse matrix of type '<type 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Column format>

In [91]: a.toarray()
Out[91]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]])

In [92]: out
Out[92]: array([ -2,   0,   6, 120,   0,  18])

Approach #2: We are performing bin-based multiplication. We have bin-based summing solution with np.bincount. So, a trick that could be use here would be converting the numbers to logarithmic numbers, perform bin-based summing and then convert back to original format with exponential (reverse of log) and that's it! For negative numbers, we might to add a step or more, but let's see what the implementation be like for non-negative numbers -

r,c,v = sparse.find(a)
out = np.exp(np.bincount(r,np.log(v),minlength = a.shape[0]))
out[np.setdiff1d(np.arange(a.shape[0]),r)] = 0

A sample run with non-negative numbers -

In [118]: a.toarray()
Out[118]: 
array([[1, 2, 0, 0],
       [0, 0, 0, 0],
       [2, 0, 3, 0],
       [4, 5, 6, 0],
       [1, 9, 0, 2]])

In [120]: out  # Using listed code
Out[120]: array([   2.,    0.,    6.,  120.,   18.])
1
votes

Make a sample:

In [51]: A=np.array([[1,2,0,0],[0,0,0,0],[2,0,3,0]])
In [52]: M=sparse.csr_matrix(A)

In lil format, values for each row are stored in a list.

In [56]: Ml=M.tolil()
In [57]: Ml.data
Out[57]: array([[1, 2], [], [2, 3]], dtype=object)

Take the product of each of those:

In [58]: np.array([np.prod(i) for i in Ml.data])
Out[58]: array([ 2.,  1.,  6.])

In csr format values are stored as:

In [53]: M.data
Out[53]: array([1, 2, 2, 3], dtype=int32)
In [54]: M.indices
Out[54]: array([0, 1, 0, 2], dtype=int32)
In [55]: M.indptr
Out[55]: array([0, 2, 2, 4], dtype=int32)

indptr gives the start of the row values. Calculation code on csr (and csc) matrices routinely perform calculations like this (but compiled):

In [94]: lst=[]; i=M.indptr[0]
In [95]: for j in M.indptr[1:]:
    ...:     lst.append(np.product(M.data[i:j]))
    ...:     i = j    
In [96]: lst
Out[96]: [2, 1, 6]

With Diavaker's test matrix:

In [137]: M.A
Out[137]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]], dtype=int32)

the above loop produces:

In [138]: foo(M)
Out[138]: [-2, 1, 6, 120, 18]

Divakar's code with unique and reduceat

In [139]: divk(M)
Out[139]: array([ -2,   0,   6, 120,  18], dtype=int32)

(different values of the empty row).

Reduceat with indptr is simply:

In [140]: np.multiply.reduceat(M.data,M.indptr[:-1])
Out[140]: array([ -2,   2,   6, 120,  18], dtype=int32)

The value for the empty 2nd line needs to be fixed (with indptr values of [2,2,...], reduceat uses M.data[2]).

def wptr(M, empty_val=1):
    res = np.multiply.reduceat(M.data, M.indptr[:-1])
    mask = np.diff(M.indptr)==0
    res[mask] = empty_val
    return res

With a larger matrix

Mb=sparse.random(1000,1000,.1,format='csr')

this wptr is about 30x faster than Divaker's version.

More discussion on calculating values across rows of a sparse matrix: Scipy.sparse.csr_matrix: How to get top ten values and indices?

0
votes

You can use the prod() method from the numpy module to calculate the product of all elements in each sublist of A while excluding elements of value 0 from being taken into account.

import numpy as np
print [[np.prod([x for x in A[i] if x!=0 ]) for i in range(len(A))]]