Python-Scipy sparse Matrices - what is A[i, j] doing?

Question

According to my previous question here (Python - Multiply sparse matrix row with non-sparse vector by index) direct indexing of sparse matrices is not possible (at least not if you don't want to work with the three arrays by which the sparse.csr matrix is defined, data, indices, indptr). But I just found out, that given a csr-sparse matrix A, this operation works fine and produces correct results: A[i, j]. What I also noticed: It is horribly slow, even slower than working with dense matrices.

I couldn't find any information about this indexing method so I am wondering: What exactly is A[i, j] doing?

If you like me to take a guess I would say it is producing a dense matrix and then indexing it like you normally would.

What gave you the idea you can't index a sparse matrix? Doing so is often a bad idea, but it's quite possible. — user2357112 supports Monica
@user2357112 Because the typical A[i][j] gave me an error. But A[i,j] works - do you know what this is doing? — binaryBigInt
A[i][j] should not be considered the "typical" way to index any NumPy or SciPy data structure. It doesn't even work for regular matrices, let alone sparse matrices. — user2357112 supports Monica
Given a 2-dimensional numpy array A, A[i][j] is indeed possible (at least on my system). Do you mind telling me why A[i, j]is so slow? — binaryBigInt
As for A[i, j], it gives you the value at row i and column j, just like regular matrix or array indexing. How it does that varies by SciPy version and sparse matrix format, but it doesn't materialize the whole matrix. That would be an absurdly extravagant waste. You can trace the code path from the relevant __getitem__ method in the source code, though you'll need to jump around a lot between files due to all the inheritance involved. — user2357112 supports Monica

hpaulj hpaulj · Accepted Answer · 2019-05-18T16:38:47

In [211]: M = sparse.csr_matrix(np.eye(3))                                   
In [212]: M                                                                  
Out[212]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

Indexing with [0] produces a new sparse matrix, (1,3) shape:

In [213]: M[0]                                                               
Out[213]: 
<1x3 sparse matrix of type '<class 'numpy.float64'>'
    with 1 stored elements in Compressed Sparse Row format>

Trying to index that again gives another sparse matrix, or error. That's because it is still a 2d object (1,3) shape.

In [214]: M[0][1]                                                            
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-214-0661a1f27e52> in <module>
----> 1 M[0][1]

/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in __getitem__(self, key)
    290             # [i, 1:2]
    291             elif isinstance(col, slice):
--> 292                 return self._get_row_slice(row, col)
    293             # [i, [1, 2]]
    294             elif issequence(col):

/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in _get_row_slice(self, i, cslice)
    397 
    398         if i < 0 or i >= M:
--> 399             raise IndexError('index (%d) out of range' % i)
    400 
    401         start, stop, stride = cslice.indices(N)

IndexError: index (1) out of range

Indexing with the [0,1] syntax does work, with the two numbers applying to the two different dimensions:

In [215]: M[0,1]                                                             
Out[215]: 0.0

A[0][1] does work with a np.ndarray, but that's because the first [0] produces an array with 1 less dimension. But np.matrix, and sparse returns a 2d matrix, not a 1d one. It's one reason we don't recommend np.matrix. With sparse the matrix nature goes deeper, so we can't simply depricate it.

We can get an idea of the code involved in selecting an element from a sparse matrix by triggering an error:

In [216]: M[0,4]                                                             
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-216-4919ae565782> in <module>
----> 1 M[0,4]

/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in __getitem__(self, key)
    287             # [i, j]
    288             if isintlike(col):
--> 289                 return self._get_single_element(row, col)
    290             # [i, 1:2]
    291             elif isinstance(col, slice):

/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py in _get_single_element(self, row, col)
    868         if not (0 <= row < M) or not (0 <= col < N):
    869             raise IndexError("index out of bounds: 0<=%d<%d, 0<=%d<%d" %
--> 870                              (row, M, col, N))
    871 
    872         major_index, minor_index = self._swap((row, col))

IndexError: index out of bounds: 0<=0<3, 0<=4<3

===

Yes, indexing an item in a sparse matrix is slower than indexing in a dense array. It's not because it first converts to dense. With a dense array indexing an item just requires converting the n-d index to a flat one, and selecting the required bytes in the 1d flat data buffer - and most of that is done in fast compiled code. But as you can see from the traceback, selecting an item from sparse matrix is more involved, and a lot of it Python.

Sparse lil format is designed to be faster for indexing (and especially for setting). But even that is quite a bit slower than indexing a dense array. Don't use sparse matrices if you need to iterate, or otherwise repeatedly access individual elements.

===

To give an idea of what's involved with indexing M, look at its key attributes:

In [224]: M.data,M.indices,M.indptr                                          
Out[224]: 
(array([1., 1., 1.]),
 array([0, 1, 2], dtype=int32),
 array([0, 1, 2, 3], dtype=int32))

To pick row 0, we have to use indptr to select a slice from the others:

In [225]: slc = slice(M.indptr[0],M.indptr[1])                               
In [226]: M.data[slc], M.indices[slc]                                        
Out[226]: (array([1.]), array([0], dtype=int32))

then to pick col 1, we have to check whether that values is in indices[slc]. If it is, return the corresponding element in data[slc]. If not return 0.

For more complex indexing, sparse actually uses matrix multiplication, having created an extractor matrix from the indices. It also uses multiplication to perform row or column sums.

Matrix multiplication is a sparse matrix strength - provided the matrix is sparse enough. The mathematical roots of sparse formats, especially csr are in sparse linear equation problems, such as finite difference and finite element PDES.

===

Here's the underlying attributes for a lil matrix

In [227]: ml=M.tolil()                                                       
In [228]: ml.data                                                            
Out[228]: array([list([1.0]), list([1.0]), list([1.0])], dtype=object)
In [229]: ml.rows                                                            
Out[229]: array([list([0]), list([1]), list([2])], dtype=object)
In [230]: ml.data[0],ml.rows[0]                                              
Out[230]: ([1.0], [0])          # cf Out[226]

Python-Scipy sparse Matrices - what is A[i, j] doing?

1 Answers