21
votes

I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.

Let's say I have row indices [2, 236, 246, 389, 1691]

and column indices [117, 3, 34, 2757, 74, 1635, 52]

so all the following entries must be incremented by one:

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

and so on.

I'm already using lil_matrix as it gave me a warning to use while I tried to update a single entry.

lil_matrix format is already not supporting multiple updating. matrix[1:3,0] += [2,3] is giving me a notimplemented error.

I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.

My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)

3
it works for lil_matrix if the right hand side is numpy array of matching shape, like A[1:3, 0] += np.array( [[ 2 ],[ 3 ]] )behzad.nouri
yeah i figured that part out. but do you have any suggestions for my specific problem? like given the row indices and column indices, i want to increment all the combination entries as given in the question.syllogismos

3 Answers

12
votes

Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
...     r = np.random.randint(rows, size=100)
...     c = np.random.randint(cols, size=100)
...     d = np.ones((100,))
...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
... 
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
    with 9985 stored elements in Compressed Sparse Row format>
7
votes
import scipy.sparse

rows = [2, 236, 246, 389, 1691]
cols = [117, 3, 34, 2757, 74, 1635, 52]
prod = [(x, y) for x in rows for y in cols] # combinations
r = [x for (x, y) in prod] # x_coordinate
c = [y for (x, y) in prod] # y_coordinate
data = [1] * len(r)
m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))

I think it works well and doesn't need loops. I am directly following the doc

<100000x40000 sparse matrix of type '<type 'numpy.int32'>'
    with 35 stored elements in COOrdinate format>
5
votes

This answer expands the comment of @behzad.nouri. To increment the values at the "outer product" of your lists of rows and columns indices, just create these as numpy arrays configured for broadcasting. In this case, that means put the rows into a column. For example,

In [59]: a = lil_matrix((4,4), dtype=int)

In [60]: a.A
Out[60]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [61]: rows = np.array([1,3]).reshape(-1, 1)

In [62]: rows
Out[62]: 
array([[1],
       [3]])

In [63]: cols = np.array([0, 2, 3])

In [64]: a[rows, cols] += np.ones((rows.size, cols.size))

In [65]: a.A
Out[65]: 
array([[0, 0, 0, 0],
       [1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 0, 1, 1]])

In [66]: rows = np.array([0, 1]).reshape(-1,1)

In [67]: cols = np.array([1, 2])

In [68]: a[rows, cols] += np.ones((rows.size, cols.size))

In [69]: a.A
Out[69]: 
array([[0, 1, 1, 0],
       [1, 1, 2, 1],
       [0, 0, 0, 0],
       [1, 0, 1, 1]])