How this mixed scipy.sparse / numpy program should be handled

Question

I am currently trying to use numpy as well a scipy in order to handle sparse matrices, but, in the process of evaluating sparsity of a matrix, I had trouble, and I don't know how the following behaviour should be understood:

import numpy as np
import scipy.sparse as sp

a=sp.csc.csc_matrix(np.ones((3,3)))
a
np.count_nonzero(a)

When evaluating a, and non zero count, using the above code, I saw this output in ipython:

Out[9]: <3x3 sparse matrix of type '' with 9 stored elements in Compressed Sparse Column format>

Out[10]: 1

I think there is something I don't understand here. A 3*3 matrix full of 1, should have 9 non-zero term, and this is the answer I get if I use the toarray method from scipy. I may be using numpy and scipy the wrong way ?

scipy's sparse matrices are not subclasses of numpy arrays or matrices. Most numpy functions will not correctly handle a scipy sparse matrix. — Warren Weckesser

hpaulj hpaulj · Accepted Answer · 2016-01-17T17:48:31

The nonzero count is available as an attribute:

In [295]: a=sparse.csr_matrix(np.arange(9).reshape(3,3))
In [296]: a
Out[296]: 
<3x3 sparse matrix of type '<class 'numpy.int32'>'
    with 8 stored elements in Compressed Sparse Row format>
In [297]: a.nnz
Out[297]: 8

As Warren commented, you can't count on numpy functions working on sparse. Use sparse functions and methods. Sometimes numpy functions are written in a way that invokes the arrays own method, in which the function call might work. But that is true only on a case by case basis.

In Ipython I make heavy use of the a.<tab> to get a list of completions (attributes and methods). I also use the function?? to look at the code.

In the case of np.count_nonzero I see no code - it is compiled, and only works on np.ndarray objects.

np.nonzero(a) works. Look at its code, and see that it looks for the array's method: nonzero = a.nonzero

The sparse nonzero method code is:

def nonzero(self):
    ...
    # convert to COOrdinate format
    A = self.tocoo()
    nz_mask = A.data != 0
    return (A.row[nz_mask],A.col[nz_mask])

The A.data !=0 line is there because it is possible to construct a matrix with 0 data elements, particularly if you use the coo (data,(i,j)) format. So apart from that caution, the nnz attribute gives a reliable count.

Doing a.<tab> I also see a.getnnz and a.eleminate_zeros methods, which may be helpful if you are worried about sneaky zeros.

Sometimes it is useful to work directly with the data attributes of a sparse matrix. It's safer to access them than to modify them. But each sparse format has different attributes. In the csr case you can do:

In [306]: np.count_nonzero(a.data)
Out[306]: 8

How this mixed scipy.sparse / numpy program should be handled

1 Answers