45
votes

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp

>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -1.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],
       [ 0.,  0., -3.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

3
If d is a sparse matrix of the same size as a you can use a.multiply(d). Perhaps you can make a d that's N rows long and loop over N rows of a at a time?mtrw
But d is dense and cannot be broadcasted explicitly in memory to satisfy the multiply shape requirements. Looping over a batch is an option but I find this a bit hackish. I would have thought there was a vanilla vectorized / scipy way to do this without a python loop.ogrisel
I guess the problem is you want the representation of a (sparse) matrix but the mulitply operation of an array. I think you're going to have to roll your own unfortunately.mtrw
Actually there is a.multply(d) that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug.ogrisel

3 Answers

47
votes

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

26
votes

I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.

HTH

1
votes

Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:

import scipy.sparse as ssp
def pointmult(a,b):
    x = a.copy()
    for i in xrange(a.shape[0]):
        if x.data[i]:
            for j in xrange(len(x.data[i])):
                x.data[i] *= b[x.rows[i]]
    return x

It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.