4
votes

In python with numpy, say I have two matrices:

  1. S, a sparse x*x matrix
  2. M, a dense x*y matrix

Now I want to do np.dot(M, M.T) which will return a dense x*x matrix S_.

However, I only care about the cells that are nonzero in S, which means that it would not make a difference for my application if I did

S_ = S*S_

Obviously, that would be a waste of operations as I would like to leave out the irrelevant cells given in S alltogether. Remember that in matrix multiplication

S_[i,j] = np.sum(M[i,:]*M[:,j])

So I want to do this operation only for i,j such that S[i,j]=True.

Is this supported somehow by numpy implementations that run in C so that I do not need to implement it with python loops?


EDIT 1 [solved]: I still have this problem, actually M is now also sparse.

Now, given rows and cols of S, I implemented it like this:

data = np.array([ M[rows[i],:].dot(M[cols[i],:]).data[0] for i in xrange(len(rows)) ])
S_   = csr( (data, (rows,cols)) )

... but it is still slow. Any new ideas?

EDIT 2: jdehesa has given a great solution, but I would like to save more memory.

The solution was to do the following:

data = M[rows,:].multiply(M[cols,:]).sum(axis=1)

and then build a new sparse matrix from rows, cols and data.

However, when running the above line, scipy builds a (contiguous) numpy array with as many elements as nnz of the first submatrix plus nnz of the second submatrix, which can lead to MemoryError in my case.

In order to save more memory, I would like to multiply iteratively each row with its respective 'partner' column, then sum over and discard the result vector. Using simple python to implement this, basically I am back to the extremely slow version.

Is there a fast way of solving this problem?

1
What's the size of these arrays? what's the sparsity of S. A small example might help, just make sure we are on the same page. Unless S is quite sparse the extra selection work might cancel out any calculation time savings.hpaulj
The problem is that the output matrix would be too large for memory, if it would be calculated as a whole.Radio Controlled

1 Answers

1
votes

Here is how you can do it with NumPy/SciPy, both for dense and sparse M matrices:

import numpy as np
import scipy.sparse as sp

# Coordinates where S is True
S = np.array([[0, 1],
              [3, 6],
              [3, 4],
              [9, 1],
              [4, 7]])

# Dense M matrix
# Random big matrix
M = np.random.random(size=(1000, 2000))
# Take relevant rows and compute values
values = np.sum(M[S[:, 0]] * M[S[:, 1]], axis=1)
# Make result matrix from values
result = np.zeros((len(M), len(M)), dtype=values.dtype)
result[S[:, 0], S[:, 1]] = values

# Sparse M matrix
# Construct sparse M as COO matrix or any other way
M = sp.coo_matrix(([10, 20, 30, 40, 50],  # Data
                   ([0, 1, 3, 4, 6],      # Rows
                    [4, 4, 5, 5, 8])),    # Columns
                  shape=(1000, 2000))
# Convert to CSR for fast row slicing
M_csr = M.tocsr()
# Take relevant rows and compute values
values = M_csr[S[:, 0]].multiply(M_csr[S[:, 1]]).sum(axis=1)
values = np.squeeze(np.asarray(values))
# Construct COO sparse matrix from values
result = sp.coo_matrix((values, (S[:, 0], S[:, 1])), shape=(M.shape[0], M.shape[0]))