2
votes

Assume I have a sparse matrix:

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

I want to zero column 0 and 2. Below is what I want to get:

array([[0, 0, 0],
       [0, 0, 0],
       [0, 5, 0]])

Below is what I tried:

sp_mat = csr_matrix((data, indices, indptr), shape=(3, 3))
zero_cols = np.array([0, 2])
sp_mat[:, zero_cols] = 0

However, I get a warning:

SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

Since the sp_mat I have is large, converting to lil_matrix is very slow.

What is an efficient way?

1

1 Answers

2
votes
In [87]: >>> indptr = np.array([0, 2, 3, 6])
    ...: >>> indices = np.array([0, 2, 2, 0, 1, 2])
    ...: >>> data = np.array([1, 2, 3, 4, 5, 6])
    ...: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [88]: M
Out[88]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>

Look at what happens with the csr assignment:

In [89]: M[:, [0, 2]] = 0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py:746: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
In [90]: M
Out[90]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 7 stored elements in Compressed Sparse Row format>
In [91]: M.data
Out[91]: array([0, 0, 0, 0, 0, 5, 0])
In [92]: M.indices
Out[92]: array([0, 2, 0, 2, 0, 1, 2], dtype=int32)

Not only does it give a warning, but it actually increases the number of 'sparse' terms, though most now have a 0 value. Those are only removed when we clean up:

In [93]: M.eliminate_zeros()
In [94]: M
Out[94]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>

In the indexed assignment, csr isn't distinguishing between setting 0s and other values. It treats all the same.

I should note that the efficiency warning is given primarily to keep users from using it repeatedly (as in an iteration). For one-time actions it is overly alarmistic.

For indexed assignment, lil format is more efficient (or at least it doesn't warn about efficiency). But converting to/from that format is time consuming.

Another option is to find and set the new 0s directly, followed by a eliminate_zeros).

Another is to use a matrix multiply. I think a diagonal sparse with 0's in the right columns will do the trick.

In [103]: M
Out[103]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [104]: D = sparse.diags([0,1,0], dtype=M.dtype)
In [105]: D
Out[105]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 3 stored elements (1 diagonals) in DIAgonal format>
In [106]: D.A
Out[106]: 
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])
In [107]: M1 = M*D
In [108]: M1
Out[108]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>
In [110]: M1.A
Out[110]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 5, 0]], dtype=int64)

If you multiply the matrix in-place, you don't get the efficiency warning. It's only changing the values of existing non-zero term, so isn't changing the sparsity of the matrix (at least not until you eliminate zeros):

In [111]: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [112]: M[:,[0,2]] *= 0
In [113]: M
Out[113]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [114]: M.eliminate_zeros()
In [115]: M
Out[115]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>