The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: "Handling Sparse Matrices").
I will try and update the tutorial, once this question is answered.
I have a large sparse matrix -- currently in dok_matrix format.
import numpy as np
from scipy import sparse
M = sparse.dok_matrix((10**6, 10**6))
For various methods I want to be able to slice columns and for others I want to slice rows. Ideally I would use advanced-indexing (i.e. a boolean vector, bool_vect
) with which to slice a sparse matrix M
-- as in:
bool_vect = np.arange(10**6)%2 # every even index
out = M[bool_vect,:] # Want to select every even row
or
out = M[:,bool_vect] # Want to select every even column
First off, dok_matrices do not support this -- but I think it works (slowly) if I first cast to lil_matrices, via sparse.lil_matrix(M)
As far as I can gather from the tutorial -- to slice columns I want to use CSC and to slice rows I want to slice CSR. So does that mean I should cast the matrix M
via:
M.tocsc()[:,bool_vect]
or
M.tocsr()[bool_vect,:]
I am kinda guessing here and my code is slow because of it. Any help from someone who understands how this works would be appreciated. Thanks in advance.
If it turns out I should not be indexing my matrix with a boolean array, but rather a list of integers (indices) -- that is also something I would be happy to find out. Whichever is more efficient.
Finally -- this is a big matrix, so bonus points if this can happen in place / with broadcasting.