Find given row in a scipy sparse matrix?

Question

Question is very simple: Let's say I have a given row r from scipy sparse matrix M (100,000X500,000), I want to find its location/index in the M matrix? How can I accomplish this in an efficient way?

Currently I am trying the following way, but it is horribly slow.

offset = 500
begin = 0
end  = begin + offset
row = row.todense() #convert sparse to dense
while 1:
    sub_M = M[begin:end,:].todense() #M matrix is too big that its dense cannot fit memory 
    labels=np.all(row == sub_M, axis=1) # here we find row in the sub set of M, but in a dense representation
    begin = end
    end = end + offset
    if (end - offset) == M.shape[0]:
        break
    elif end > M.shape[0]:
        end = M.shape[0]

jrennie jrennie · Accepted Answer · 2012-12-21T02:41:24

Unless you want to dig into the internals of one or more sparse matrix types, you should use CSR format for your matrix and:

Calculate the length (L2 norm) of each matrix row; in other words: sum(multiply(M, M), 2)
Normalize r to (L2) length 1
Matrix multiply M*r (where r is treated as a column vector)

If an entry of M*r matches the length of the corresponding row, then you have a match.

Note that the default ord for numpy.linalg.norm is L2 norm.

Find given row in a scipy sparse matrix?

2 Answers