Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:
>>> import numpy as np
>>> import scipy.sparse as ssp
>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0., 0., 0.],
[ 0., 0., -1.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 2., 0.]])
Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):
>>> d = np.ones(3) * 3
>>> d
array([ 3., 3., 3.])
I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:
>>> a * d
array([ 0., -3., 0., 0., 6.])
One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:
>>> a.toarray() * d
array([[ 0., 0., 0.],
[ 0., 0., -3.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 6., 0.]])
But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):
>>> ssp.issparse(a.toarray())
False
Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?
d
is a sparse matrix of the same size asa
you can usea.multiply(d)
. Perhaps you can make ad
that's N rows long and loop over N rows ofa
at a time? – mtrwa.multply(d)
that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug. – ogrisel