3
votes

I am a researcher working on geophysical inversion. Which can requires solving linear system: Au = rhs. Here A is often sparse matrix, but rhs and u can are either dense matrix or vector. To proceed gradient-based inversion, we need sensitivity computation, and it requires a number of matrix-matrix and matrix-vector multiplication. Recently I have found a weird behaviour in matrix (sparse) - matrix (dense) multiplication, and below is an example:

import numpy as np
import scipy.sparse as sp
n = int(1e6)
m = int(100)
e = np.ones(n)
A = sp.spdiags(np.vstack((e, e, e)), np.array([-1, 0, 1]), n, n)
A = A.tocsr()
u = np.random.randn(n,m)

%timeit rhs = A*u[:,0]
#10 loops, best of 3: 22 ms per loop    
%timeit rhs = A*u[:,:10]
#10 loops, best of 3: 98.4 ms per loop
%timeit rhs = A*u
#1 loop, best of 3: 570 ms per loop​

I was expecting almost linear increase in compution time when I am increasing the size of dense matrix u multiplied by sparse matrix A (e.g. the second one A*u[:,:10] supposed to me 220 ms and the final one A*u[:,:10] 2.2s). However, it is much faster than I expected. Reversely, Matrix-vector multiplication is much slower than Matrix-Matrix multiplication. Can someone explain why? Further, is there an effective way to boost up Matrix-vector multiplication similar level of efficiency to Matrix-Matrix multiplication?

1
You/I would have to dig into the function calling stack for these multiplications. And that's not a trivial task. Clearly it isn't just iterating over columns of u and collecting values. This is a sparse*dense=>dense case, and different from a sparse*sparse or dense*dense.hpaulj

1 Answers

2
votes

If you look at the source code, you can see that csr_matvec (which implements matrix-vector multiplication) is implemented as a straightforward sum loop in C code, while csr_matvecs (which implements matrix-matrix multiplication) is implemented as a call to the axpy BLAS routine. Depending on what BLAS library your installation is linked to, such a call can be far more efficient than the straightforward C implementation used for matrix-vector multiplication. That's likely why you're seeing matrix-vector multiplication be so slow.

Changing scipy so that it calls BLAS in the matrix-vector case could be a useful contribution to the package.