Segmentation Fault on ndarray matrices dot product

Question

I am performing dot product of a matrix with 50000 rows and 100 columns with it's transpose. The values of the matrix is in float.

A(50000, 100)

B(100, 50000)

Basically I get the matrix after performing SVD on a larger sparse matrix.
The matrix is of numpy.ndarray type.
I use dot method of numpy for multiplying the two matrices. And I get segmentation fault.

numpy.dot(A, B)

The dot product works well on matrix with 30000 rows but fails for 50000 rows.

Is there any limit on numpy's dot product?
Any problem above while using dot product?
Is there any other good python linear algebra tool which is efficient on large matrices.

I think it's unfortunate that it segfaults rather than give an error message. However, a 50000x50000 double matrix requires ~20GB of memory. How much RAM does your computer have? — NPE
Gives a MemoryError on my box (NumPy 1.8.1). Be aware that the resulting matrix would take 9.3GB in float32 format, twice that in float64. — Fred Foo
@NPE I am running this on 8GB machine. Perhaps thats the reason for Segmentation Fault.Is there any efficient way in Python for these scale of computations? — sravan_kumar

DrV DrV · Accepted Answer · 2014-06-30T19:41:08

As you have been told, there is a memory problem. You want to do this:

numpy.dot(A, A.T)

which requires a lot of memory for the result (not the operands). However the operation is easy to perform in pieces. You may use a loop-based approach to produce one output row at a time:

def trans_multi(A):
    rows = A.shape[0]
    result = numpy.empty((rows, rows), dtype=A.dtype)
    for r in range(rows):
        result[r,:] = numpy.dot(A, A[r,:].T)
    return result

This, as such, is just a slower and equally memory consuming version (numpy.dot is well-optimized). However, what you most probably want to do is to write the result into a file, as you do not have the memory to hold the result:

def trans_multi(A, filename):
    with open(filename, "wb") as f:
        rows = A.shape[0]
        for r in range(rows):
            f.write(numpy.dot(A, A[r,:].T).tostring())

Yes, it is not exactly lightning-fast. However, it is most probably the fastest you can hope for. Sequential writes are usually well-optimized. I tried:

a=numpy.random.random((50000,100)).astype('float32')
trans_multi(a,"/tmp/large.dat")

This took approximately 60 seconds, but it really depends on your HDD performance.

Why not memmap?

I like mmap, and numpy.memmap is a great thing. However, numpy.memmap is optimized for having large tables and calculating small results from them. There is, for example memmap.dot which is optimized for taking dot products of memmapped arrays. The scenario is that the operands are memmapped, but the result is in RAM. Exactly the other way round, that is.

Memmapping is very useful when you have random access. Here the access is not random access but sequential write. Also, yf you try to use numpy.memmap to create a (50000,50000) array of float32's, it will take some time (for some reason I do not get, maybe it initializes the data even though there is no need).

However, after the file has been created, it is a very good idea to use numpy.memmap to analyze the huge table, as it gives the best random read performance and a very convenient interface.

Segmentation Fault on ndarray matrices dot product

1 Answers