Concatenating specific rows from a sparse matrix in scipy

Question

I have a large sparse matrix (using scipy.sparse) with I rows and U columns, U is much greater than I. I have a list of U random numbers in the range of 0:I. I would like to create a new sparse matrix which will be a U * U sparse matrix, the row for user u will hold all the U values in row i of the original sparse matrix. For example, if the original matrix is a 3*5 matrix:

0,0,2,1,0
0,0,3,4,1
1,1,0,2,0

and the list of random numbers is [0,0,2,1,2]

The resulting matrix should be:

0,0,2,1,0
0,0,2,1,0
1,1,0,2,0
0,0,3,4,1
1,1,0,2,0

I am using this code now, which is very very slow:

for u in range(U):
    i= random_indices[u]
    if u == 0:
        output_sparse_matrix = original_sparse_matrix[i, :]
    else:
        output_sparse_matrix = vstack((output_sparse_matrix,
                                       original_sparse_matrix[i, :]))

Any suggestions on how this can be done quicker?

Update I used Jérôme Richard's suggestion, but inside a loop - since I got an out of memory error. This is the solution that worked:

bins = np.array_split(random_indices, 10)
output_sparse_matrix = original_sparse_matrix[bins[0]]

for bin in bins[1:10]:
   output_sparse_matrix = vstack((output_sparse_matrix ,original_sparse_matrix[bin]))

Even with dense arrays, repeatedly doing concatenate in a loop is slow. There we recommend collecting the arrays in a list and doing one join at the end. I haven't played with sparse.vstack much, but I think the same applies sparse.vstack combines the coo attributes of its arguments, and uses them to make a new coo matrix. — hpaulj
Look at the code for sparse.vstack. It delegates the task to sparse.bmat, bmat([[b] for b in blocks]). — hpaulj

Jérôme Richard Jérôme Richard · Accepted Answer · 2021-04-10T16:44:39

vstack create a new matrix for every iteration. This is the main source of slowdown since the complexity of the algorithm is O(U^3). You can just append the new lines in a Python list and then vstack the list of lines. Alternatively, a better approach is just to use the following Numpy expression :

original_sparse_matrix[random_indices, :]

Concatenating specific rows from a sparse matrix in scipy

2 Answers