I have two scipy_sparse_csr_matrix 'a' and scipy_sparse_csr_matrix(boolean) 'mask', and I want to set elements of 'a' to zero where element of mask is True.
for example
>>>a
<3x3 sparse matrix of type '<type 'numpy.int32'>'
with 4 stored elements in Compressed Sparse Row format>
>>>a.todense()
matrix([[0, 0, 3],
[0, 1, 5],
[7, 0, 0]])
>>>mask
<3x3 sparse matrix of type '<type 'numpy.bool_'>'
with 4 stored elements in Compressed Sparse Row format>
>>>mask.todense()
matrix([[ True, False, True],
[False, False, True],
[False, True, False]], dtype=bool)
Then I want to obtain the following result.
>>>result
<3x3 sparse matrix of type '<type 'numpy.int32'>'
with 2 stored elements in Compressed Sparse Row format>
>>>result.todense()
matrix([[0, 0, 0],
[0, 1, 0],
[7, 0, 0]])
I can do it by operation like
result = a - a.multiply(mask)
or
a -= a.multiply(mask) #I don't care either in-place or copy.
But I think above operations are inefficient. Since actual shape of 'a' and 'mask' are 67,108,864 × 2,000,000, these operations take several seconds on high spec server(64 core Xeon cpu, 512GB memory). For example, 'a' has about 30,000,000 non-zero elements, and 'mask' has about 1,800,000 non-zero(True) elements, then above operation take about 2 seconds.
Is there more efficient way to do this?
Conditions are below.
- a.getnnz() != mask.getnnz()
- a.shape = mask.shape
Thanks!
Other way(tried)
a.data*=~np.array(mask[a.astype(np.bool)]).flatten();a.eliminate_zeros() #This takes twice the time longer than above method.
nnz
ofa
andmask
compare? Besides not being the same. Are both equally sparse? – hpaulj