4
votes

Suppose I have two dense matrices U (10000x50) and V(50x10000), and one sparse matrix A(10000x10000). Each element in A is either 1 or 0. I hope to find A*(UV), noting that '*' is element-wise multiplication. To solve the problem, Scipy/numpy will calculate a dense matrix UV first. But UV is dense and large (10000x10000) so it's very slow.

Because I only need a few elements of UV indicated by A, it should save a lot of time if only necessary elements are calculated instead of calculating all elements then filtering using A. Is there a way to instruct scipy to do this?

BTW, I used Matlab to solve this problem and Matlab is smart enough to find what I'm trying to do and works efficiently.

Update: I found Matlab calculated UV fully as scipy does. My scipy installation is simply too slow...

1
Just to be clear, in einsum notation, you want np.einsum('ij,ik,kj->ij',A,U,V)? (not that I"m suggesting using einsum).hpaulj
In MATLAB how do you write this? A.*U*V?hpaulj
How do you define "very slow"? On my laptop A*(U.dot(V)) with the sizes above, and a dense A array takes 1.04 second to compute. Are you sure you have Numpy compiled with a fast BLAS library? I cant see a way around explicitly calculating all the elements, even though A is sparse. On the other hand, how do you know that Matlab is not doing that too?rth
Intuitively I think there should be a way of using A to generate U1 and V1 such that U1.dot(V1) is your answer. It might be easiest is each row and column of A has at most one 1, but there should be a way even if there are multiple values.hpaulj
@rth: I think you are correct. In MATLAB both timeit(@() A.*(U*V)) and timeit(@() U*V) run about equally long. So the sparsity of A is only taken into account after U*V is calculated.user2379410

1 Answers

2
votes

Here's a test script and possible speedup. The basic idea is to use the nonzero coordinates of A to select rows and columns of U and V, and then use einsum to perform a subset of the possible dot products.

import numpy as np
from scipy import sparse

#M,N,d = 10,5,.1
#M,N,d = 1000,50,.1
M,N,d = 5000,50,.01   # about the limit for my memory

A=sparse.rand(M,M,d)
A.data[:] = 1   # a sparse 0,1 array
U=(np.arange(M*N)/(M*N)).reshape(M,N)
V=(np.arange(M*N)/(M*N)).reshape(N,M)

A1=A.multiply(U.dot(V))   # the direct solution
A2=np.einsum('ij,ik,kj->ij',A.A,U,V)

print(np.allclose(A1,A2))

def foo(A,U,V):
    # use A to select elements of U and V
    A3=A.copy()
    U1=U[A.row,:]
    V1=V[:,A.col]
    A3.data[:]=np.einsum('ij,ji->i',U1,V1)
    return A3

A3 = foo(A,U,V)

print(np.allclose(A1,A3.A))

The 3 solutions match. For large arrays, foo is about 2x faster than the direct solution. For small size, the pure einsum is competitive, but bogs down for large arrays.

The use of dot in foo would have computed too many products, ij,jk->ik as opposed to ij,ji->i.