In python with numpy, say I have two matrices:
S
, a sparsex*x
matrixM
, a densex*y
matrix
Now I want to do np.dot(M, M.T)
which will return a dense x*x
matrix S_
.
However, I only care about the cells that are nonzero in S
, which means that it would not make a difference for my application if I did
S_ = S*S_
Obviously, that would be a waste of operations as I would like to leave out the irrelevant cells given in S
alltogether. Remember that in matrix multiplication
S_[i,j] = np.sum(M[i,:]*M[:,j])
So I want to do this operation only for i,j
such that S[i,j]=True
.
Is this supported somehow by numpy implementations that run in C so that I do not need to implement it with python loops?
EDIT 1 [solved]: I still have this problem, actually M
is now also sparse.
Now, given rows and cols of S
, I implemented it like this:
data = np.array([ M[rows[i],:].dot(M[cols[i],:]).data[0] for i in xrange(len(rows)) ])
S_ = csr( (data, (rows,cols)) )
... but it is still slow. Any new ideas?
EDIT 2: jdehesa has given a great solution, but I would like to save more memory.
The solution was to do the following:
data = M[rows,:].multiply(M[cols,:]).sum(axis=1)
and then build a new sparse matrix from rows
, cols
and data
.
However, when running the above line, scipy builds a (contiguous) numpy array with as many elements as nnz
of the first submatrix plus nnz
of the second submatrix, which can lead to MemoryError
in my case.
In order to save more memory, I would like to multiply iteratively each row with its respective 'partner' column, then sum over and discard the result vector. Using simple python to implement this, basically I am back to the extremely slow version.
Is there a fast way of solving this problem?
S
. A small example might help, just make sure we are on the same page. UnlessS
is quite sparse the extra selection work might cancel out any calculation time savings. – hpaulj