I'm trying to find the dot product between a huge matrix and itself.
Shape of the matrix (371744, 36154) Num of NonZero - 577731 [very sparse]
mat1 is scipy.sparse.csr_matrix If i use mat1 * mat1.T I get a value error, this looks like its because there are too many non-zero elements in the resulting matrix and the index pointer overflows according to here
dp_data = data_m * data_m.T
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 247, in __mul__
return self._mul_sparse_matrix(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 300, in _mul_sparse_matrix
return self.tocsr()._mul_sparse_matrix(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 290, in _mul_sparse_matrix
indices = np.empty(nnz, dtype=np.intc)
ValueError: negative dimensions are not allowed
I also tried np.dot
But the doc says,
"As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore using it will result on unexpected results or errors. The corresponding dense matrix should be obtained first instead"
when I to mat1.toarray() or todense() I get a memory error as the matrix is huge!! I have 16GB of memory! The program seems to work fine for smaller inputs!
data_array = data_m.toarray()
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 550, in toarray
return self.tocoo(copy=False).toarray()
File "/usr/lib/python2.7/dist-packages/scipy/sparse/coo.py", line 219, in toarray
B = np.zeros(self.shape, dtype=self.dtype)
MemoryError
I'm Using Numpy version 1.8.1 Numpy version 0.9.0
How else do I do this multiplication?
dot
attribute of the sparse matrix instead ofnp.dot
? – Gabriel