I have many lists that represent a sparse matrix (i.e., the columns that have nonzero entries) that I need to represent as a SciPy sparse csc_matrix
. However, note that there is only one row in my sparse matrix and so the list simply points to the columns within this row that has nonzero entries. For example:
sparse_input = [4, 10, 21] # My lists are much, much longer but very sparse
This list tells me which columns within my single row sparse matrix where there are nonzero values. This is what the dense matrix would look like.
x = np.array([[0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1]])
I could use the (data, (row, col))
syntax but since my lists are super long the csc_matrix
takes a lot of time and memory to build. So, I was thinking about using the indptr
interface but I'm having trouble figuring out how to quickly and automatically build the indptr
directly from a given sparse list of nonzero column entries. I tried looking at csr_matrix(x).indptr
and I see that the indptr
looks like:
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3], dtype=int32)
I've read the SciPy docs and the Sparse Matrix Wikipedia page but I can't seem to come up with an efficient method to construct the indptr
directly from a list of nonzero columns. It just feels like indptr
shouldn't be this long in length considering that there are only three nonzero entries in the sparse matrix.