In [211]: M = sparse.csr_matrix(np.eye(3))
In [212]: M
Out[212]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
Indexing with [0] produces a new sparse matrix, (1,3) shape:
In [213]: M[0]
Out[213]:
<1x3 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
Trying to index that again gives another sparse matrix, or error. That's because it is still a 2d object (1,3) shape.
In [214]: M[0][1]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-214-0661a1f27e52> in <module>
----> 1 M[0][1]
/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in __getitem__(self, key)
290 # [i, 1:2]
291 elif isinstance(col, slice):
--> 292 return self._get_row_slice(row, col)
293 # [i, [1, 2]]
294 elif issequence(col):
/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in _get_row_slice(self, i, cslice)
397
398 if i < 0 or i >= M:
--> 399 raise IndexError('index (%d) out of range' % i)
400
401 start, stop, stride = cslice.indices(N)
IndexError: index (1) out of range
Indexing with the [0,1] syntax does work, with the two numbers applying to the two different dimensions:
In [215]: M[0,1]
Out[215]: 0.0
A[0][1]
does work with a np.ndarray
, but that's because the first [0]
produces an array with 1 less dimension. But np.matrix
, and sparse
returns a 2d matrix, not a 1d one. It's one reason we don't recommend np.matrix
. With sparse
the matrix nature goes deeper, so we can't simply depricate it.
We can get an idea of the code involved in selecting an element from a sparse matrix by triggering an error:
In [216]: M[0,4]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-216-4919ae565782> in <module>
----> 1 M[0,4]
/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in __getitem__(self, key)
287 # [i, j]
288 if isintlike(col):
--> 289 return self._get_single_element(row, col)
290 # [i, 1:2]
291 elif isinstance(col, slice):
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py in _get_single_element(self, row, col)
868 if not (0 <= row < M) or not (0 <= col < N):
869 raise IndexError("index out of bounds: 0<=%d<%d, 0<=%d<%d" %
--> 870 (row, M, col, N))
871
872 major_index, minor_index = self._swap((row, col))
IndexError: index out of bounds: 0<=0<3, 0<=4<3
===
Yes, indexing an item in a sparse matrix is slower than indexing in a dense array. It's not because it first converts to dense. With a dense array indexing an item just requires converting the n-d index to a flat one, and selecting the required bytes in the 1d flat data buffer - and most of that is done in fast compiled code. But as you can see from the traceback, selecting an item from sparse matrix is more involved, and a lot of it Python.
Sparse lil
format is designed to be faster for indexing (and especially for setting). But even that is quite a bit slower than indexing a dense array. Don't use sparse matrices if you need to iterate, or otherwise repeatedly access individual elements.
===
To give an idea of what's involved with indexing M
, look at its key attributes:
In [224]: M.data,M.indices,M.indptr
Out[224]:
(array([1., 1., 1.]),
array([0, 1, 2], dtype=int32),
array([0, 1, 2, 3], dtype=int32))
To pick row 0, we have to use indptr
to select a slice from the others:
In [225]: slc = slice(M.indptr[0],M.indptr[1])
In [226]: M.data[slc], M.indices[slc]
Out[226]: (array([1.]), array([0], dtype=int32))
then to pick col 1, we have to check whether that values is in indices[slc]
. If it is, return the corresponding element in data[slc]
. If not return 0.
For more complex indexing, sparse actually uses matrix multiplication, having created an extractor
matrix from the indices. It also uses multiplication to perform row or column sums.
Matrix multiplication is a sparse matrix strength - provided the matrix is sparse enough. The mathematical roots of sparse formats, especially csr
are in sparse linear equation problems, such as finite difference and finite element PDES.
===
Here's the underlying attributes for a lil
matrix
In [227]: ml=M.tolil()
In [228]: ml.data
Out[228]: array([list([1.0]), list([1.0]), list([1.0])], dtype=object)
In [229]: ml.rows
Out[229]: array([list([0]), list([1]), list([2])], dtype=object)
In [230]: ml.data[0],ml.rows[0]
Out[230]: ([1.0], [0]) # cf Out[226]
A[i][j]
gave me an error. ButA[i,j]
works - do you know what this is doing? – binaryBigIntA[i][j]
should not be considered the "typical" way to index any NumPy or SciPy data structure. It doesn't even work for regular matrices, let alone sparse matrices. – user2357112 supports MonicaA
,A[i][j]
is indeed possible (at least on my system). Do you mind telling me whyA[i, j]
is so slow? – binaryBigIntA[i, j]
, it gives you the value at rowi
and columnj
, just like regular matrix or array indexing. How it does that varies by SciPy version and sparse matrix format, but it doesn't materialize the whole matrix. That would be an absurdly extravagant waste. You can trace the code path from the relevant__getitem__
method in the source code, though you'll need to jump around a lot between files due to all the inheritance involved. – user2357112 supports Monica