You're using the sparse(I, J, SV)
form [note: link goes to documentation for GNU Octave, not Matlab]. The scipy.sparse
equivalent is csr_matrix((SV, (I, J)))
-- yes, a single argument which is a 2-tuple containing a vector and a 2-tuple of vectors. You also have to correct the index vectors because Python consistently uses 0-based indexing.
>>> m = sps.csr_matrix(([3,0], ([2,1], [1,3]))); m
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> m.todense()
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 3, 0, 0]], dtype=int64)
Note that scipy, unlike Matlab, does not automatically discard explicit zeroes, and will use integer storage for matrices containing only integers. To perfectly match the matrix you got in Matlab, you must explicitly ask for floating-point storage and you must call eliminate_zeros()
on the result:
>>> m2 = sps.csr_matrix(([3,0], ([2,1], [1,3])), dtype=np.float)
>>> m2.eliminate_zeros()
>>> m2
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
>>> m2.todense()
matrix([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 3., 0., 0.]])
You could also change [3,0]
to [3., 0.]
but I recommend an explicit dtype=
argument because that will prevent surprises when you are feeding in real data.
(I don't know what Matlab's internal sparse matrix representation is, but Octave appears to default to compressed sparse column representation. The difference between CSC and CSR should only affect performance. If your NumPy code winds up being slower than your Matlab code, try using sps.csc_matrix
instead of csr_matrix
, as well as all the usual NumPy performance tips.)
(You probably need to read NumPy for Matlab users if you haven't already.)
csr
docs. The concept is the same, but argument details are different. – hpaulj