I have a sparse matrix and I'm trying to add a sparse vector to it. I've tried different sparse formats, including csr, csc, lil, coo, and different ways of adding the sparse vector to sparse matrix, including vstack and concatenate.
All ways and formats turned out to be very slow. But when I convert the vector to dense format (by todense() ) and append it to a dense matrix (numpy.ndarray specifically) it is done very quickly. Why is it? Is there a trick or a suitable format for this that I'm missing?
Here is my code for when I tried it with 'coo' format:
from scipy.sparse import coo_matrix, rand
from time import time as timer
from numpy import array, concatenate, empty
### sparse appending in coo way ####
def sparse_append(A):
dim = A.shape[1]
mat = coo_matrix((0, dim))
sparse_addtime = 0
for vector in A:
st = timer()
row = coo_matrix(vector)
newdata = concatenate((mat.data, row.data))
newrows = concatenate((mat.row, row.row + mat.shape[0]))
newcols = concatenate((mat.col, row.col))
mat = coo_matrix((newdata, (newrows, newcols)), shape = ((mat.shape)[0]+1, (mat.shape)[1]))
et = timer()
sparse_addtime += et-st
return sparse_addtime
#### dense append ####
def dense_append(A):
dim = A.shape[1]
mat = empty([0,dim])
dense_addtime = 0
for vector in A:
st = timer()
mat = concatenate((mat,vector))
et = timer()
dense_addtime += et-st
return dense_addtime
### main ####
if __name__ == '__main__':
dim = 400
n = 200
A = rand(n, dim, density = 0.1, format='lil')
B = A.todense() #numpy.ndarray
t1 = sparse_append(A)
t2 = dense_append(B)
print t1, t2
Any help is appreciated.
scipy.sparse.hstack
andscipy.sparse.vstack
? – Sid