I need to identify the rows (/columns) that have defined values in a large sparse Boolean Matrix. I want to use this to 1. slice (actually view
) the Matrix by those rows/columns; and 2. slice (/view
) vectors and matrices that have the same dimensions as the margins of a Matrix. I.e. the result should probably be a Vector of indices / Bools or (preferably) an iterator.
I've tried the obvious:
a = sprand(10000, 10000, 0.01)
cols = unique(a.colptr)
rows = unique(a.rowvals)
but each of these take like 20ms on my machine, probably because they allocate about 1MB (at least they allocate cols
and rows
). This is inside a performance-critical function, so I'd like the code to be optimized. The Base code seems to have an nzrange
iterator for sparse matrices, but it is not easy for me to see how to apply that to my case.
Is there a suggested way of doing this?
Second question: I'd need to also perform this operation on views of my sparse Matrix - would that be something like x = view(a,:,:); cols = unique(x.parent.colptr[x.indices[:,2]])
or is there specialized functionality for this? Views of sparse matrices appear to be tricky (cf https://discourse.julialang.org/t/slow-arithmetic-on-views-of-sparse-matrices/3644 – not a cross-post)
Thanks a lot!
cols
implementation has a bug, for exampleunique(speye(2).colptr)
is a 3 element vector. See full answer for more info. – Dan Getz