I've seen discussion on the scipy github about giving more control over this summing, but I don't know of any production changes. As the docs indicate, there's a long standing tradition over summing the duplicates.
As created, the coo
matrix does not sum; it just assigns your parameters to its attributes:
In [697]: S = sparse.coo_matrix((data, (rows, cols)))
In [698]: S.data
Out[698]: array([1, 1])
In [699]: S.row
Out[699]: array([0, 0], dtype=int32)
In [700]: S.col
Out[700]: array([0, 0], dtype=int32)
Converting to dense (or to csr/csc) does sum - but doesn't change S
itself:
In [701]: S.A
Out[701]: array([[2]])
In [702]: S.data
Out[702]: array([1, 1])
You can performing the summing inplace with:
In [703]: S.sum_duplicates()
In [704]: S.data
Out[704]: array([2], dtype=int32)
I don't know of a way of either removing the duplicates or bypassing that action. I may look up the relevant issue.
=================
S.todok()
does an inplace sum (that is, changes S
). Looking at that code I see that it calls self.sum_duplicates
. The following replicates that without the sum:
In [727]: dok=sparse.dok_matrix((S.shape),dtype=S.dtype)
In [728]: dok.update(zip(zip(S.row,S.col),S.data))
In [729]: dok
Out[729]:
<1x1 sparse matrix of type '<class 'numpy.int32'>'
with 1 stored elements in Dictionary Of Keys format>
In [730]: print(dok)
(0, 0) 1
In [731]: S
Out[731]:
<1x1 sparse matrix of type '<class 'numpy.int32'>'
with 2 stored elements in COOrdinate format>
In [732]: dok.A
Out[732]: array([[1]])
It's a dictionary update, so the final value is the last of the duplicates. I found elsewhere that dok.update
is a pretty fast way of adding values to a sparse matrix.
tocsr
inherently does the sum; tolil
uses tocsr
; so this todok
approach may be simplest.