2
votes

I am using Python and Scipy library to create a sparse matrix, specifically csr_matrix (Compressed Sparse Row Matrix). The matrix is rather big, about 70000*70000 elements. I build the matrix as a 2d array and then construct the csr_matrix, giving the 2d array as an argument. Constructing a very sparse matrix of the size is easily done without any issues.

The problem rises when giving a denser 2d array (much less zero elements), the process is interrupted with an error:

Value Error: unrecognized csr_matrix constructor usage

I tried building a dense matrix in the interactive Python environment with the same size and got exactly the same error.

from scipy import sparse
a = [[10 for i in range(70000)] for j in range(70000)]
mat = sparse.csr_matrix(a)

So my question is:

-Does constructing the csr_matrix depend on how sparse the 2d array is? What is the limit?

-How can I be sure the program wouldn't be interrupted in the middle of processing with such errors?

-Any alternative solutions?

Thanks in advance

2

2 Answers

1
votes

With smaller numbers your method works:

In [20]: a=[[10 for i in range(1000)] for j in range(1000)]
In [21]: M=sparse.csr_matrix(a)
In [22]: M
Out[22]: 
<1000x1000 sparse matrix of type '<class 'numpy.int32'>'
    with 1000000 stored elements in Compressed Sparse Row format>

Density is not the issue. Size probably is. I can't reproduce your error because when I try larger sizes my machine slows to a crawl and I have to interrupt the process.

As given in the documentation, csr_matrix takes several kinds of input. It recognizes them based on the number of elements. I'd have to look at the code to remember the exact logic. But one method expects a tuple of 3 arrays or lists, another a tupe of 2 items, with the second being another tuple. The third is a numpy array. Your case, a list of lists doesn't fit any of those, but it probably trys to turn it into an array.

a = np.array([[10 for i in range(M)] for j in range(N)])

Most likely your error message is the result of some sort memory error - you are trying to make too large of a matrix. A dense matrix 70000 square is big (at least on some machines) and a sparse one representing the same matrix will be even larger. It has to store each of the elements 3 times - once for value, and twice for coordinates.

A truely sparse matrix of that size works because the sparse representation is much smaller, roughly proportional to 3x the number of nonzero elements.


In scipy/sparse/compressed.py

class _cs_matrix(...):
    """base matrix class for compressed row and column oriented matrices"""
    def __init__(self, arg1, ...):
        <is arg1 a sparse matrix>
        <is arg1 a tuple>
       else:
            # must be dense
            try:
                arg1 = np.asarray(arg1)
            except:
                raise ValueError("unrecognized %s_matrix constructor usage" % self.format)

My guess it that it tries:

np.asarray([[10 for i in range(70000)] for j in range(70000)])

and that results in some sort of error, most likely 'too large' or 'memory'. That error is caught, and reissued with this 'unrecognized ..' message.

Try

A = np.array(a)
M = sparse.csr_matrix(A)

I suspect it will give you a more informative error message.

0
votes

Check out the last two examples on creating sparse matrices:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html

You probably can find the answers to your other questions in the documentation as well