2
votes

First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.

The formula portion at issue is where a matrix is added to a scalar.

A = V + x

Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.

The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.

With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)

The other matrix types (CSR, CSC, LIL) don't support scalar addition.

I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.

There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.

I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.

1
Just to be clear: in the end, the result that you want is the dense matrix A? Or do you want to add the scalar x to just the nonzero elements of V?Warren Weckesser

1 Answers

5
votes

Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:

>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[  0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,  10.,   0.],
        [  0.,   0.,   0.,   0.,   0.],
        [  0.,  20.,   0.,   0.,   0.]])

Then you could update:

>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[   0.,    0.,    0.,    0.,    0.],
        [   0.,    0.,    0.,    0.,    0.],
        [   0.,    0.,    0.,  109.,    0.],
        [   0.,    0.,    0.,    0.,    0.],
        [   0.,  119.,    0.,    0.,    0.]])

Not particularly performant, but is O(nonzero).

OTOH, if you have something like COO, CSC, or CSR, you can modify the data attribute directly:

>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119.,  109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,  1109.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,  1119.,     0.,     0.,     0.]])

Note that you're probably going to want to add an additional

>>> C.eliminate_zeros()

to handle the possibility that you've added a negative number and so there's now a 0 which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number trick, it would add somenumber to that zero you introduced.