2
votes

I have a 50,000 by 50,000 dense matrix or larger. If I use the numpy or scipy- packages the entries of all my eigenvectors are 0. If I use scipy.sparse to calculate just 1000-8000 eigenvectors, I get the right eigenvectos. But I need all of them.

Could this be a memory problem? Or what is the reason for such a problem? Can I use LAPACK or ARPACK to calculate the right eigenvectors?

Please note my matrices are representations of networkx graphs and therefore sparse matrices. I transform them to dense matrices for using numpy.linalg, otherwise I work with scipy.sparse.linalg.

1
numpy.linalg.eig or scipy.sparse.eigs are more a less wrappers to Lapack or Arpack. It can be a precision problem (You are using double precision dtype=np.float64 right?)...max9111
@max9111 yes i am using dtype=np.float, I guess that is the same as dtype=np.float64...? If it is a precision problem, what could i do to solve it? Should i work with dtype=np.float32 and what would be the difference to dtype=np.float64?Python_Question
That would have been the easiest approach. Yes float equals float64. It looks like a quite complicate problem from a numerical point of view. Maybe this scicomp.stackexchange.com/q/7369/29186 helps..max9111
Please don't make more work for others by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under a CC BY-SA license, for SE to distribute the content (i.e. regardless of your future choices). By SE policy, the non-vandalized version is distributed. Thus, any vandalism will be reverted. Please see: How does deleting work? …. If permitted to delete, there's a "delete" button below the post, on the left, but it's only in browsers, not the mobile app.Makyen
scipy.linalg.eigh has an arg subset_by_value to get eigenpairs in a given range; can you try that with A.astype(np.float32) ? and eigcheck the results.denis

1 Answers

3
votes

The issue regarding numpy.linalg.eig() and scipy.linalg.eig() may be a memory error related to the size of the matrix: a 50000x50000 double precision dense matrix occupies 18Go of memory.

numpy.linalg.eig() and scipy.linalg.eig() rely on the routine DGEEV() of LAPACK. LAPACK DGEEV() and DGEEVX() makes use of the QR algorithm to compute all eigenvalues and eigenvectors of a dense matrix. First, the matrix is reduced to to upper Hessenberg form using dgehrd(), then QR iterations are performed in dhseqr(). In DGEEVX(), the matrix is first balanced and scaled.

On the other hand,scipy.sparse.linalg.eigs() and scipy.sparse.linalg.eigsh() rely on ARPACK functions implementing the Implicitly Restarted Arnoldi Method and Implicitly Restarted Lanczos Method on sparse matrix. Both are improvements of the power method and are very efficient at computing the largest eigenvalues/eigenvectors with improved accuracy. If Ax=b can be rapidly solved, these iterative methods are also very efficent at finding the smallest eigenvalues/eigenvectors, or eigenvalues/eigenvectors near a given value.

The difference between these methods is explained in Lloyd N. Trefethen and David Bau, NUMERICAL LINEAR ALGEBRA, Lecture 33. The Arnoldi Iteration.

... whereas the Arnoldi iteration is based upon the QR factorization (33.7) of the matrix ,whose columns are b, A b, ... ,A^{n-1} b, simultaneous iteration and the QR algorithm are based upon the QR factorization (28.16) of the matrix whose columns are A^n e_1 ... , A^n e_n .

From a practical viewpoint, the Arnoldi iteration is always applied to retreive a limited number of eigenvectors significantly lower than the size of the matrix. Nevertheless, the argument sigma of scipy.sparse.linalg.eigs() or scipy.sparse.linalg.eigsh() enables finding interior eigenvalues and eigenvectors near sigma. Hence, it is possible to call scipy.sparse.linalg.eigsh() multiple times using different sigma. If the eigenvalues all feature a limited multiplicity, all eigenvectors can be recovered. Potentiel duplicates are to be avoided by separating eigenvalues and tracking their multiplicity.

The basic call using sigma writes:

sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig,sigma=sig, which='LM', mode='normal')

If the matrix is Hermitian semi Here is a sample code, based on your previous deleted post, that computes all eigenvalues of a positive semi-definite sparse matrix. As the eigenvalues are all real and positive, sigma is progressively increased to find all eigenvalues:

import numpy as np
import networkx as nx
import scipy as sp




def tolist(sparsevalue, keeplast=False):
    listofeigval=[]
    listofmultiplicity=[]

    curreig=sparsevalue[0]-1.1*np.abs(sparsevalue[0])
    for i in range(len(sparsevalue)):
           #print curreig, sparsevalue[i], len(listofeigval)
           if np.abs(curreig-sparsevalue[i])<1e-11*(np.abs(curreig)+np.abs(sparsevalue[i])):
                listofmultiplicity[-1]=listofmultiplicity[-1]+1
           else :
                curreig=sparsevalue[i]
                listofeigval.append(curreig)
                listofmultiplicity.append(1)

    if keeplast==False:
        #the last one is not sure regarding multiplicity:
        listofeigval.pop()
        listofmultiplicity.pop()

    return listofeigval,listofmultiplicity

def test():
    N_1 = 2000
    R_1 = 0.1
    k = 0
    iterations = 1
    while k < iterations:
      G = nx.random_geometric_graph(N_1, R_1)
      if nx.is_connected(G) == True:
          print 'got connected network'
          k = k+1
          M=nx.laplacian_matrix(G)      #M is here a sparse matrix
          M = M.astype(float)
          #M[0,0]=M[0,0]+1.  # this makes the laplacian_matrix positive definite.
          #sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M, k = N_1-2)
          kkeig=400
          sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig, which='SM')
          print sparsevalue

          listofeigval=[]
          listofmultiplicity=[]
          listofeigval,listofmultiplicity=tolist(sparsevalue)
          print len(listofeigval), len(listofmultiplicity)

          nbeigfound=0
          for mul in listofmultiplicity:
              nbeigfound=nbeigfound+mul
          keepgoing=True
          while( nbeigfound<N_1):
               print '(',nbeigfound,'/',N_1,')  is ', listofeigval[-1]
               meanspacing=0.
               meanspacingnb=0.

               for i in range(10):
                     meanspacingnb=meanspacingnb+listofmultiplicity[len(listofeigval)-i-1]
               meanspacing=(listofeigval[-1]-listofeigval[len(listofeigval)-10])/meanspacingnb
               sig=listofeigval[-1]+0.1*kkeig*meanspacing

               keeplast=False
               if nbeigfound<N_1-0.5*kkeig:
                   sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig,sigma=sig, which='LM', mode='normal')
               else:
                   keeplast=True
                   sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig, which='LM')
               listofneweigval,listofnewmultiplicity=tolist(sparsevalue,keeplast)
               #need to retreive the starting point
               index=len(listofeigval)-2*kkeig/10
               if listofneweigval[1]>listofeigval[index]:
                   while(np.abs(listofneweigval[1]-listofeigval[index])>1e-10*(np.abs(listofneweigval[1])+np.abs(listofeigval[index]))):
                       index=index+1
               else:
                   while(np.abs(listofneweigval[1]-listofeigval[index])>1e-10*(np.abs(listofneweigval[1])+np.abs(listofeigval[index]))):
                       index=index-1
               #The first matching eigenvalue is found.
               #zipping the list and checking if it works.
               i=1
               while index<len(listofeigval) and i<len(listofneweigval) :
                    if (np.abs(listofneweigval[i]-listofeigval[index])>1e-10*(np.abs(listofneweigval[i])+np.abs(listofeigval[index]))):
                         print 'failed after ', index, ' different eigenvalues', ': wrong eigenvalue'
                         return listofeigval[0:index], listofmultiplicity[0:index], 1
                    if listofmultiplicity[index] != listofnewmultiplicity[i] :
                         print 'failed after ', index, ' different eigenvalues', ': wrong multiplicity'
                         return listofeigval[0:index], listofmultiplicity[0:index], 2
                    index=index+1
                    i=i+1
               #adding the remaining eigenvalues.
               while i<len(listofneweigval) :
                    listofeigval.append(listofneweigval[i])
                    listofmultiplicity.append(listofnewmultiplicity[i])
                    nbeigfound=nbeigfound+listofnewmultiplicity[i]
                    i=i+1
          print 'number of eigenvalues: ', nbeigfound
          nbl=0
          for i in range(len(listofeigval)):
               print 'eigenvalue ',i,' (',nbl,'/',N_1,')  is %14.8f'% listofeigval[i], ' with multiplicity', listofmultiplicity[i]
               nbl=nbl+listofmultiplicity[i]

          return listofeigval, listofmultiplicity, 0


          #sig=39.1
          #sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = 100,sigma=sig, which='LM', mode='normal')
          #print sparsevalue

test()

For Hermitian matrices featuring real positive and negative eigenvalues, both positive and negative values of sigma are to be explored. If the matrix is not Hermitian, the eigenvalues may not be real and values of sigma on the complex plane are to be chosen. Searching first for the magnitude of the largest eigenvalue of A limits the area to a disk.

The proposed method is very slow and may not always work. It worked once for a 20000x20000 matrix, using 1Go of memory.