I am trying to implement the following equation using scipy's sparse package:
W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ...
where x & y are a nxm csc_matrix. Basically I'm trying to multiply each col of x by each col of y and sum the resulting nxn matrices together. I then want to make all non-zero elements 1.
This is my current implementation:
c = sparse.csc_matrix((n, n))
for i in xrange(0,m):
tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T
minimum(tmp.data,ones_like(tmp.data),tmp.data)
maximum(tmp.data,ones_like(tmp.data),tmp.data)
c = c + tmp
This implementation has the following problems:
Memory usage seems to explode. As I understand it, memory should only increase as c becomes less sparse, but I am seeing that the loop starts eating up >20GB of memory with a n=10,000, m=100,000 (each row of x & y only has around 60 non-zero elements).
I'm using a python loop which is not very efficient.
My question: Is there a better way to do this? Controlling memory usage is my first concern, but it would be great to make it faster!
Thank you!
x[:,i]
is going to give you the ith column ofx
, not the row – JoshAdely
, notx
. (Either that, or the title is wrong.) – Steve Tjoa