Fast sparse matrix multiplication w/o allocating a dense array

Question

I have an m x m sparse matrix similarities and a vector with m elements, combined_scales. I wish to multiply the ith column in similarities by combined_scales[i]. Here's my first attempt at this:

for i in range(m):
    scale = combined_scales[i]
    similarities[:, i] *= scale

This is semantically correct but was performing poorly, so I tried changing it to this:

# sparse.diags creates a diagonal matrix.
# docs: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.diags.html
similarities *= sparse.diags(combined_scales)

But I immediately got a MemoryError when running this line. Bizarrely, it seems that scipy is attempting to allocate a dense numpy array here:

Traceback (most recent call last):
  File "main.py", line 108, in <module>
    loop.run_until_complete(main())
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 466, in run_until_complete
    return future.result()
  File "main.py", line 100, in main
    magic.fit(df)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 127, in fit
    self._scale_similarities(X, net_similarities)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 148, in _scale_similarities
    similarities *= sparse.diags(combined_scales)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\base.py", line 440, in __mul__
    return self._mul_sparse_matrix(other)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\compressed.py", line 503, in _mul_sparse_matrix
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
MemoryError

How do I prevent it from allocating a dense array here? Thanks.

Are you sure you're allocating a dense array? Run similarities.count_nonzero() and tell us what it returns. — OneRaynyDay
Actually, the last line of the traceback suggest it is trying to allocate a sparse result, nnz standing for "number of non zeros". — Paul Panzer

hpaulj hpaulj · Accepted Answer · 2018-04-17T02:02:58

From sparse.compressed

class _cs_matrix    # common for csr and csc
    def _mul_sparse_matrix(self, other):
        M, K1 = self.shape
        K2, N = other.shape

        major_axis = self._swap((M,N))[0]
        other = self.__class__(other)  # convert to this format

        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=M*N)
        indptr = np.empty(major_axis + 1, dtype=idx_dtype)

        fn = getattr(_sparsetools, self.format + '_matmat_pass1')
        fn(M, N,
           np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           indptr)

        nnz = indptr[-1]
        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=nnz)
        indptr = np.asarray(indptr, dtype=idx_dtype)
        indices = np.empty(nnz, dtype=idx_dtype)
        data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

        fn = getattr(_sparsetools, self.format + '_matmat_pass2')
        fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           self.data,
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           other.data,
           indptr, indices, data)

        return self.__class__((data,indices,indptr),shape=(M,N))

similarities is a sparse csr matrix. other, the diag matrix, has been converted to csr as well in

other = self.__class__(other)

csr_matmat_pass1 (compiled code) is run with the indices from self and other, returning nnz, the number of nonzero terms in the output.

It then allocates the indptr, indices and data arrays that will hold the results from csr_matmat_pass2. These are used to create the return matrix

self.__class__((data,indices,indptr),shape=(M,N))

The error occurs in creating the data array:

data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

The return result just has too many nonzero values for your memory.

What is m, and similarities.nnz?

Is there enough memory to do similarities.copy()?

While you are using similarities *= ..., it first has to do similarities * other. The result will then replace self. It does not attempt to do an in-place multiplication.

inplace iteration by column

There have been a lot of questions about faster iteration by rows (or columns), seeking to do things like sorting or getting the largest row values. Working directly with the csr attributes can speed this up considerably. I think the idea applies here

Example:

In [275]: A = sparse.random(10,10,.2,'csc').astype(int)
In [276]: A.data[:] = np.arange(1,21)
In [277]: A.A
Out[277]: 
array([[ 0,  0,  4,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0, 10,  0,  0, 16, 18],
       [ 0,  0,  0,  0,  0, 11, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  8,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  9, 12,  0,  0, 17,  0],
       [ 2,  0,  0,  0,  0, 13,  0,  0,  0,  0],
       [ 0,  0,  5,  7,  0,  0,  0, 15,  0, 19],
       [ 0,  0,  6,  0,  0,  0,  0,  0,  0, 20]])
In [280]: B = sparse.diags(np.arange(1,11),dtype=int)
In [281]: B
Out[281]: 
<10x10 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements (1 diagonals) in DIAgonal format>
In [282]: (A*B).A
Out[282]: 
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]], dtype=int64)

Inplace iteration on columns:

In [283]: A1=A.copy()
In [284]: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:     
In [285]: A1.A
Out[285]: 
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]])

Time comparisons:

In [287]: %%timeit A1=A.copy()
     ...: A1 *= B
     ...: 
375 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [288]: %%timeit A1 = A.copy()
     ...: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:     
79.9 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Fast sparse matrix multiplication w/o allocating a dense array

1 Answers

inplace iteration by column