In [87]: >>> indptr = np.array([0, 2, 3, 6])
...: >>> indices = np.array([0, 2, 2, 0, 1, 2])
...: >>> data = np.array([1, 2, 3, 4, 5, 6])
...: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [88]: M
Out[88]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
Look at what happens with the csr
assignment:
In [89]: M[:, [0, 2]] = 0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py:746: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
In [90]: M
Out[90]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [91]: M.data
Out[91]: array([0, 0, 0, 0, 0, 5, 0])
In [92]: M.indices
Out[92]: array([0, 2, 0, 2, 0, 1, 2], dtype=int32)
Not only does it give a warning, but it actually increases the number of 'sparse' terms, though most now have a 0 value. Those are only removed when we clean up:
In [93]: M.eliminate_zeros()
In [94]: M
Out[94]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>
In the indexed assignment, csr
isn't distinguishing between setting 0s and other values. It treats all the same.
I should note that the efficiency warning is given primarily to keep users from using it repeatedly (as in an iteration). For one-time actions it is overly alarmistic.
For indexed assignment, lil
format is more efficient (or at least it doesn't warn about efficiency). But converting to/from that format is time consuming.
Another option is to find and set the new 0s directly, followed by a eliminate_zeros
).
Another is to use a matrix multiply. I think a diagonal sparse with 0's in the right columns will do the trick.
In [103]: M
Out[103]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [104]: D = sparse.diags([0,1,0], dtype=M.dtype)
In [105]: D
Out[105]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
In [106]: D.A
Out[106]:
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 0]])
In [107]: M1 = M*D
In [108]: M1
Out[108]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>
In [110]: M1.A
Out[110]:
array([[0, 0, 0],
[0, 0, 0],
[0, 5, 0]], dtype=int64)
If you multiply the matrix in-place, you don't get the efficiency warning. It's only changing the values of existing non-zero term, so isn't changing the sparsity of the matrix (at least not until you eliminate zeros):
In [111]: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [112]: M[:,[0,2]] *= 0
In [113]: M
Out[113]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [114]: M.eliminate_zeros()
In [115]: M
Out[115]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>