I have two scipy sparse csr matrices with the exact same shape but potentially different data values and nnz value. I now want to get the top 10 elements of one matrix and increase the value on the same indices on the other matrix. My current approach is as follows:
idx = a.data.argpartition(-10)[-10:]
i, j = matrix.nonzero()
i_idx = i[idx]
j_idx = j[idx]
b[i_idx, j_idx] += 1
The reason I have to go this way is that a.data and b.data do not necessarily have the same number of elements and hence the indices would differ.
My question now is whether I can improve this in some way. As far as I know the nonzero procedure is not elegant as I have to allocate two new arrays and I am very tough on memory already. I can get the j_indices via csr_matrix.indices but what about the i_indices? Can I use the indptr in a nice way for that?
Happy for any hints.
indptr
has one value per row (plus 1). It indicates where each row starts in thedata
andindices
arrays. You can do the math, or you can convert the arraytocoo()
. Thenrow
andcol
have values you want. But beware, there are some warnings about indices may not be sorted. – hpauljnonzero
. If converts the matrix tocoo
and returns therow
andcol
. – hpauljtop 10 elements
mean first 10 nonzeros in CSR format? – paul-g