1
votes

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C. Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?

3

3 Answers

3
votes

Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.

import numpy as np
import scipy as sp

iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
3
votes

First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).

3
votes

Here's one approach using np.einsum for a vectorized solution -

from scipy import sparse
from scipy.sparse import coo_matrix

# Get row, col for the output array
r,c,_= sparse.find(W)

# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])

# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)

Sample run -

1) Inputs :

In [168]: A
Out[168]: 
array([[4, 6, 1, 1, 1],
       [0, 8, 1, 3, 7],
       [2, 8, 3, 2, 2],
       [3, 4, 1, 6, 3]])

In [169]: B
Out[169]: 
array([[5, 2, 4],
       [2, 1, 3],
       [7, 7, 2],
       [5, 7, 5],
       [8, 5, 0]])

In [176]: W
Out[176]: 
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
    with 5 stored elements in Compressed Sparse Row format>

In [177]: W.toarray()
Out[177]: 
array([[ True, False, False],
       [False, False, False],
       [ True,  True, False],
       [ True, False,  True]], dtype=bool)

2) Using dense array to perform direct calculations and verify results later on :

In [171]: (A.dot(B))*W.toarray()
Out[171]: 
array([[52,  0,  0],
       [ 0,  0,  0],
       [73, 57,  0],
       [84,  0, 56]])

3) Use the proposed codes and get sparse matrix output :

In [172]: # Using proposed codes
     ...: r,c,_= sparse.find(W)
     ...: out = np.einsum('ij,ji->i',A[r],B[:,c])
     ...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
     ...: 

4) Finally verify results by converting to dense/array version and checking against direct version -

In [173]: out_sparse.toarray()
Out[173]: 
array([[52,  0,  0],
       [ 0,  0,  0],
       [73, 57,  0],
       [84,  0, 56]])