I have a large sparse matrix (using scipy.sparse) with I rows and U columns, U is much greater than I. I have a list of U random numbers in the range of 0:I. I would like to create a new sparse matrix which will be a U * U sparse matrix, the row for user u will hold all the U values in row i of the original sparse matrix. For example, if the original matrix is a 3*5 matrix:
0,0,2,1,0
0,0,3,4,1
1,1,0,2,0
and the list of random numbers is [0,0,2,1,2]
The resulting matrix should be:
0,0,2,1,0
0,0,2,1,0
1,1,0,2,0
0,0,3,4,1
1,1,0,2,0
I am using this code now, which is very very slow:
for u in range(U):
i= random_indices[u]
if u == 0:
output_sparse_matrix = original_sparse_matrix[i, :]
else:
output_sparse_matrix = vstack((output_sparse_matrix,
original_sparse_matrix[i, :]))
Any suggestions on how this can be done quicker?
Update I used Jérôme Richard's suggestion, but inside a loop - since I got an out of memory error. This is the solution that worked:
bins = np.array_split(random_indices, 10)
output_sparse_matrix = original_sparse_matrix[bins[0]]
for bin in bins[1:10]:
output_sparse_matrix = vstack((output_sparse_matrix ,original_sparse_matrix[bin]))
numpy
? orscipy.sparse
? – hpauljsparse.vstack
much, but I think the same appliessparse.vstack
combines thecoo
attributes of its arguments, and uses them to make a newcoo
matrix. – hpauljsparse.vstack
. It delegates the task tosparse.bmat
,bmat([[b] for b in blocks])
. – hpaulj