7
votes

I have a very large and sparse matrix of size 180GB(text , 30k * 3M) containing only the entries and no additional data. I have to do matrix multiplication , inversion and some similar linear algebra operations over it. I tried octave and simple single-threaded C code for the multiplication but my system RAM of 40GB gets used up very fast and then I can find the program starts thrashing. Is there any other options available to me. I am not familiar with MathLab or any other matrix operational library that can help me in doing so.

When I run a simple matrix multiplication of two matrices with 10 rows and 3 M cols, and its transpose, it gives the following error :

    memory exhausted or requested size too large for range of Octave's index type

I am not sure whether the same would work on Matlab or not. For sparse matrix representation and matrix multiplication, is there another library or code.

3
Are you saying that the full matrix data is 180GB, or do you mean that the sparse representation itself is 180GB? What are the matrix dimensions, and how many non-zero elements do you have?paddy
if I get it right, you are able to load the entire 180GB matrix into an octave variable, then you run into memory troubles as soon as you try to fiddle with the huge variable? Can you cast/convert the huge variable into sparse, e.g., m=readFromFile( hugeFileName.txt );m=sparse(m);?Shai
You have to block import your matrix, cast each imported block to sparse and store it into a cell array. Once you imported all blocks, just concatenate them all at once. You will notice that 180GB will vanish if your sparsity is 99%.]Oleg
See this discussion about the size limit of a matrix in Octave (sparse matrix inclusive). Basically boils down to the fact that Octave uses a 32bit integer internally to index the matrix. You can build Octave with 64 bit indexing but all of Octave dependencies will also need it.carandraug
MATLAB allows indices in sparse to be 2^48-1 = 281474976710655 where 3e4 * 3e6 is smaller (for 64 bit OS)Oleg

3 Answers

1
votes

if there are few enough nonzero entries, I suggest creating a sparse matrix S with appropriate dimensions and max nonzero entries; see matlab create sparse matrix. Then as @oleg komarov described, load the matrix in blocks and assign the nonzero entries from each block into the correct address in the sparse matrix S. I feel that if your matrix is sparse enough, then loading it is really the only difficulty you face. I had similar issues with large transfer operators.

0
votes

Have you considered performing your processing in blocks? Transposition and multiplications work very well with block matrix processing (see https://en.wikipedia.org/wiki/Block_matrix) and that will get you around any limitations about the indices.

This wouldn't help you with matrix inversion though unless you can decompose your matrix in blocks when blocks that aren't on the diagonal are completely empty, which isn't stated in your assumptions.

0
votes

Octave has a limit in both the memory resources of about 2GB and the maximum number of indices a matrix can hold of about 2^32 (for 32 bits Octave). MatLab doesn't have such a memory limit, since it will use all of your memory resources, swapping file included. Thus you could try with MatLab by setting a huge swapfile, you may then compute your operations (but it will anyway take quite along time...).

If you are interested by other approaches, you may take a look into out-of-core computing which aims to promote new methods to process huge datasets that cannot reside all in memory, but rather store it on disk and load efficiently the bits that are necessary.

For a practical approach, you may take a look into Blaze for Python (notice: still in development!).