0
votes

I am working with a code that involves the creating of a large relatively sparse matrix and the solution of a least squares minimization problem using it. However, I have been getting memory errors when I run my code, despite the fact that it seems that the matrix should not be large enough to strain my system (27000 by 2100 roughly, in my test cases)

I have created a simplified code that has the same storage requirements as my test case and also generates a memory error (note that the "sparse" matrix is not actually very sparse as I am testing with a smaller scale problem than what the actual intended dataset will entail):

import numpy as np
from scipy import sparse

BM = sparse.lil_matrix((27000, 3000))
for i in range(0, 3000):
    local_mat = np.random.rand(30,30,30)
    local_mat[local_mat<0.1] = 0
    vals = local_mat.ravel()
    nonzero = vals.nonzero()
    BM[nonzero, i] = vals[nonzero]

If I change the parameters such that there are more zero entries in the sparse matrix, I will still get a memory error from scipy.sparse.linalg.lsq_linear after filling the rows of the matrix and performing a minimization problem with it

It goes without saying that I get a memory error if I use a dense matrix as well.

I have tried to increase the size of my paging file to 2-4 gigabytes but that hasn't helped, though it seems like this should not be that memory intensive regardless

1
When I loop 30 times, BM.nnz is 729012. The last iteration has set 24182 of possible 27000 elements.. This is not sparse at all! - hpaulj
Yes that's more or less a quirk of the specific test case i'm using, since the code is oriented around particle measurements in 3d space, but I've been using 30x30x30 densely packed windows before moving on to large 1000x1000x100 data sets. - George
I don't want to intentionally create a memory error or otherwise hang my computer. You'll have to sort out these issues yourself. Keep a watch on the matrix sparsity, and watch out for any inadvertent conversion to dense. If the memory error produces a traceback, examine that,and maybe even post it. - hpaulj
Alright. In general, do you know of any alternative solutions? Even calling an operation like nonzero() on the dense version of the matrix causes a memory error - George
A useful sparse matrix will be 90% zeros (or higher) - hpaulj

1 Answers

1
votes

You've created a memory bonfire. Let's recall how a lil_matrix works:

Each row of the matrix is kept as a python list. There is one list for data and another list for index. Each non-zero element has one entry on each list.

Taking your example 27000, 3000 matrix, let's add this up. Each entry in a python list is an object which burns 16 bytes for overhead. For the data list, the float data itself is another 8 bytes. So just the non-zero values eats up 2GB. Now let's look at the indices - another set of lists each entry of which burns 16 bytes for overhead. Poof - there's another 1.5 GB gone. Bit more memory for the lists and the array that holds them, but that's not much compared to what's already gone down.

Compared to a 27000, 3000 dense array, which eats up 650MB of memory, you've managed to use an extra 5x as much memory, while also acquiring several major disadvantages compared to a numpy array. A standard sparse format like CSR would be a much better choice - that would use just about 900MB of memory for your 90% dense example.

EDIT: Also you're trying to call a function which absolutely requires a CSR matrix, so that function is casting the horrible lil_matrix you've created into a csr_matrix, forcing you to keep two full copies of this data in-memory.