python sparse matrix get maximum values and index

Question

I have a sparse matrix A(equal to 10 * 3 in dense), such as:

print type(A)
<class scipy.sparse.csr.csr_matrix>

print A
(0, 0)  0.0160478743808
(0, 2)  0.0317314165078
(1, 2)  0.0156596521648
(1, 0)  0.0575683686558
(2, 2)  0.0107481166871
(3, 0)  0.0150580924929
(3, 2)  0.0297743235876
(4, 0)  0.0161931803955
(4, 2)  0.0320187296788
(5, 2)  0.0106034409766
(5, 0)  0.0128109177074
(6, 2)  0.0105766993238
(6, 0)  0.0127786088452
(7, 2)  0.00926522256063
(7, 0)  0.0111941023699

The max values for each column is:

print A.max(axis=0)
(0, 0)  0.0575683686558
(0, 2)  0.0320187296788

I would like to get the index corresponding to the column value. I know that the

A.getcol(i).tolist()

will return me a list of each column which allow me to use argmax() function, but this way is really slow. I am wondering is there any descent way to do?

Is your matrix able to fit in memory? Doing A.todense().argmax(axis=0) would do what you want as long as the A.todense() is possible. — kbrose
argmax would be a nice enhancement to the scipy sparse matrices. In the meantime: Can you switch to CSC format? If so, there is a way to get the argmax of the columns fairly efficiently. — Warren Weckesser
@kbrose, the .todense() not possible since the size of data doesn't fit the memory. — KEXIN WANG

Warren Weckesser Warren Weckesser · Accepted Answer · 2016-07-11T15:42:16

This is a slight variation of the method you suggested in the question:

col_argmax = [A.getcol(i).A.argmax() for i in range(A.shape[1])]

(The .A attribute is equivalent to .toarray().)

A potentially more efficient alternative is

B = A.tocsc()
col_argmax = [B.indices[B.indptr[i] + B.data[B.indptr[i]:B.indptr[i+1]].argmax()] for i in range(len(B.indptr)-1)]

Either of the above will work, but I have to ask: if your array has shape (10, 3), why are you using a sparse matrix? (10, 3) is small! Just use a regular, dense numpy array.

Even if you keep A as a sparse matrix, the most efficient way to compute the argmax of the columns of a matrix that size is probably to just convert it to a dense array and use the argmax method:

col_argmax = A.A.argmax(axis=0)

python sparse matrix get maximum values and index

2 Answers