I have a number of scipy sparse matrices (currently in CSR format) that I need to multiply with a dense numpy 1D vector.
The vector is called G
:
print G.shape, G.dtype
(2097152,) complex64
Each sparse matrix has shape (16384,2097152)
and is very sparse. The density is approximately 4.0e-6.
I have a list of 100 of these sparse matrices called spmats
.
I can easily multiply each matrix with G
like so:
res = [spmat.dot(G) for spmat in spmats]
This results in a list of dense vectors of shape (16384,)
as expected.
My application is fairly perfomance-critical, and so I tried an alternate, which is to first concatenate all the sparse matrices into one large sparce matrix, and then use only one call of dot()
like so:
import scipy.sparse as sp
SPMAT = sp.vstack(spmats, format='csr')
RES = SPMAT.dot(G)
This results in one long vector RES
which has shape (1638400,)
, and is a concatenated version of all the result vectors in res
above, as expected. I've checked that the results are identical.
Maybe I'm completely wrong, but I expected that the second case should be faster than the first, since there are far fewer numpy calls, memory allocations, creation of python objects, python loops, etc. I don't care about the time required to concatenate the sparse matrices, only the time to compute the result. According to %timeit
, however:
%timeit res = [spmat.dot(G) for spmat in spmats]
10 loops, best of 3: 91.5 ms per loop
%timeit RES = SPMAT.dot(G)
1 loops, best of 3: 389 ms per loop
I've checked that I'm not running out of memory in either operation, and nothing fishy seems to be going on. Am I crazy, or is this really strange? Does this mean that all sparse matrix-vector products should be done in blocks, a few rows at a time, to make them faster? As far as I understand, sparse matrix multiplication time with a dense vector should be linear in the number of nonzero elements, which is unchanged in the two cases above. What could be making such a difference?
I'm running on a single core linux machine with 4GB ram, using EPD7.3
EDIT:
Here is a small example that reproduces the problem for me:
import scipy.sparse as sp
import numpy as n
G = n.random.rand(128**3) + 1.0j*n.random.rand(128**3)
spmats = [sp.rand (128**2, 128**3, density = 4e-6, format = 'csr', dtype=float64) for i in range(100)]
SPMAT = sp.vstack(spmats, format='csr')
%timeit res = [spmat.dot(G) for spmat in spmats]
%timeit RES = SPMAT.dot(G)
I get:
1 loops, best of 3: 704 ms per loop
1 loops, best of 3: 1.34 s per loop
The performance difference in this case is not as large as with my own sparse matrices that have some structure (probably because of caching) but it's still worse to concatenate the matrices.
I have tried with both scipy 10.1 and 12.0.
G
hasdtype=np.complex64
as you said, and both approaches are equally fast whenG
hasdtype=np.complex128
. – jorgeca