1
votes

I have a csr_matrix 'a' type of sparse matrix. I want to perform an operation to create a new csr_matrix 'b' where each row of 'b' is same ith row of 'a'.

I think for normal numpy arrays it is possible using 'tile' operation. But I am not able to find the same for csr_matrix.

Making first a numpy matrix and converting to csr_matrix is not an option as the size of matrix is 10000 x 10000.

3

3 Answers

1
votes

I actually could get to answer which doesn't require creating full numpy matrix and is quite fast for my purpose. So adding it as answer if it's useful for people in future:

rows, cols = a.shape
b = scipy.sparse.csr_matrix((np.tile(a[2].data, rows), np.tile(a[2].indices, rows),
                           np.arange(0, rows*a[2].nnz + 1, a[2].nnz)), shape=a.shape)

This takes 2nd row of 'a' and tiles it to create 'b'.

Following is the timing test, seems quite fast for 10000x10000 matrix:

100 loops, best of 3: 2.24 ms per loop
0
votes

There is a blk format, that lets you create a new sparse matrix from a list of other matrices.

So for a start you could

 a1 = a[I,:]
 ll = [a1,a1,a1,a1]
 sparse.blk_matrix(ll)

I don't have a shell running to test this.

Internally this format turns all input arrays into coo format, and collects their coo attributes into 3 large lists (or arrays). In your case of tiled rows, the data and col (j) values would just repeat. The row (I) values would step.

Another way to approach it would be to construct a small test matrix, and look at the attributes. What kinds of repetition do you see? It's easy to see patterns in the cooformat. lil might also be easy to replicate, maybe with the list *n operation. csr is trickier to understand.

0
votes

One can do

row = a.getrow(row_idx)
n_rows = a.shape[0]
b = tiled_row = sp.sparse.vstack(np.repeat(row, n_rows))