7
votes

I would like to multiply two large sparse matrices. The first is 150,000x300,000 and the second is 300,000x300,000. The first matrix has about 1,000,000 non-zero items and the second matrix has about 20,000,000 non-zero items. Is there a straightforward way to get the product of these matrices?

I'm currently storing the matrices in csr or csc format and trying matrix_a * matrix_b. This gives the error ValueError: array is too big.

I'm guessing I could store the separate matrices on disk with pytables, pull them apart into smaller blocks, and construct the final matrix product from the products of many blocks. But I'm hoping for something relatively simple to implement.

EDIT: I'm hoping for a solution that works for arbitrarily large sparse matrices, while hiding (or avoiding) the bookkeeping involved in moving individual blocks back and forth between memory and disk.

1
What shape should the result have?eumiro
@miro: 150,000 by 300,000. But I expect the product will still be sparse.DanB

1 Answers

6
votes

Strange, because the following worked for me:

import scipy.sparse
mat1 = scipy.sparse.rand(150e3, 300e3, density=1e6/150e3/300e3)
mat2 = scipy.sparse.rand(300e3, 300e3, density=20e6/150e3/300e3)
cmat1 = scipy.sparse.csc_matrix(mat1)
cmat2 = scipy.sparse.csc_matrix(mat2)
res = cmat1 * cmat2

I'm using the latest scipy. And the amount of RAM used by python was ~3GB

So maybe your matrices are such that their product is not very sparse ?