6
votes

I am trying to implement the following equation using scipy's sparse package:

W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ...

where x & y are a nxm csc_matrix. Basically I'm trying to multiply each col of x by each col of y and sum the resulting nxn matrices together. I then want to make all non-zero elements 1.

This is my current implementation:

    c = sparse.csc_matrix((n, n))
    for i in xrange(0,m):
        tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T
        minimum(tmp.data,ones_like(tmp.data),tmp.data)
        maximum(tmp.data,ones_like(tmp.data),tmp.data)

        c = c + tmp

This implementation has the following problems:

  1. Memory usage seems to explode. As I understand it, memory should only increase as c becomes less sparse, but I am seeing that the loop starts eating up >20GB of memory with a n=10,000, m=100,000 (each row of x & y only has around 60 non-zero elements).

  2. I'm using a python loop which is not very efficient.

My question: Is there a better way to do this? Controlling memory usage is my first concern, but it would be great to make it faster!

Thank you!

2
x[:,i] is going to give you the ith column of x, not the rowJoshAdel
@JoshAdel: You are right, I misspoke, I meant to say multiply the columns of x by columns of y. I have updated the question. Thanks!RussellM
Your equation is a sum of inner products, not outer products. You must transpose the columns of y, not x. (Either that, or the title is wrong.)Steve Tjoa
Please edit your question to be unambiguous respect to transpose. Are you aiming to count how many times each nonzero element is summed in outer product? Thankseat
@Steve: You are right Steve- I have made the correction. ThanksRussellM

2 Answers

3
votes

Note that a sum of outer products in the manner you describe is simply the same as multiplying two matrices together. In other words,

sum_i X[:,i]*Y[:,i].T == X*Y.T

So just multiply the matrices together.

Z = X*Y.T

For n=10000 and m=100000 and where each column has one nonzero element in both X and Y, it computes almost instantly on my laptop.

0
votes

In terms of memory and performance, this might be a prime candidate for using Cython.

There is a section of the following paper describing its use with sparse scipy matricies:

http://folk.uio.no/dagss/cython_cise.pdf