How to do efficient matrix multiplication between a dense numpy matrix and a sparse scipy vector?

Question

Multiplying a dense numpy matrix with a sparse scipy vector using @ is incredibly inefficient . It seems like it doesn't make use of the sparsity of the vector at all.

Say we have

A = np.eye(5000)
x = np.eye(5000)[333]
x = scipy.sparse.coo_matrix(x).T # make it a sparse vector

Then multiplication using @ yields:

%timeit A @ x
8 ms ± 78.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Let's write an incredibly crappy sparse multiplication ourselves:

def mult_dense_with_sparse(A, x):
  return (A[:,x.nonzero()[0]] @ x.toarray()[x.nonzero()[0]]).T[0]

Lo and behold:

%timeit mult_dense_with_sparse(A, x)
50.3 µs ± 45.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

It's much faster! Even though this implementation first creates a dense vector by adding all the zeros again, and then removes all the zeros again... So I'm wondering, if not @, how can I multiply a dense numpy matrix with a sparse scipy vector efficiently? Surely such a basic operation is part of scipy?

Edit: provided solution in other question is not helpful as it is just as inefficient as @:

%timeit scipy.sparse.csr_matrix.dot(A, x)
7.97 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Edit 2 in responaw Hameer Abbasi:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   101                                                       @profile
   102                                                       def ratio(self, n):
   103        80         51.0      0.6      0.0                  s = n + 1
   104        80   11401854.0 142523.2     16.1                  self.sfpc = self.scolPCA(s) # sparse first principal component
   106        80     351898.0   4398.7      0.5                  wSums = (self.signals[:,self.sfpc.nonzero()[0]] @ self.sfpc.toarray()[self.sfpc.nonzero()[0]]).T[0]
   108        80   56487433.0 706092.9     79.7                  wSums = self.sfpc.T.dot(self.signals.T)[0]
   110        80    2521189.0  31514.9      3.6                  self.Flag, self.threshold, self.incline, self.deltaVar = self.actFunctOpt(list(wSums))
   111        80        160.0      2.0      0.0                  return  self.deltaVar / (2 + s)

Here you can see that our 'hack' takes about 0.5% of the time of this function, while using dot takes 79.7% of the time in this function.

Possible duplicate of Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently — user2699
I provided an edit showing that scipy.sparse.csr_matrix.dot is just as inefficient in the case of matrix-vector multiplication. — Bart Louwers
If the result is a dense array, the sparse matrix has first been converted to an array (.toarray()). Only sparse*sparse returns sparse, using scipy.sparse's own code. — hpaulj

Hameer Abbasi Hameer Abbasi · Accepted Answer · 2018-09-12T14:45:41

In your example, A is of type np.ndarray, and x is of type scipy.sparse.coo_matrix.

If you're looking for the simplest answer possible to speed this up, here it is:

import numpy as np
import scipy.sparse as sps
A = np.random.rand(5000, 5000)
v = np.zeros((5000, 1))
v[333] = 1
v_s = sps.csc_matrix(v)
%timeit v_s.T.dot(A).T # 46.6 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit A @ v # 6.75 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, if you want to get the mechanics behind this answer: Currently, the operator @ doesn't support overloading for a NumPy array due to a bug. Also, scipy.sparse.coo_matrix doesn't even try to override @, and matrix multiplication is faster for scipy.sparse.csr_matrix (coo_matrix will first be converted to csr_matrix for multiplication anyway).

So what happens is NumPy converts your sparse vector into a NumPy array, then performs dense multiplication.

However, csr_matrix.dot exists and supports multiplication with dense NumPy arrays. So we use the property A @ B = (B.T @ A.T).T, along with the fact that csc_matrix.T produces csr_matrix to produce the above optimised code.

How to do efficient matrix multiplication between a dense numpy matrix and a sparse scipy vector?

1 Answers