4
votes

I was wondering if numpy has efficient implementation to compute the largest or smallest eigenvalue of symmetric matrix, without the full spectral decomposition if possible. I find the following modules implement the eigen-decomposition:

  1. scipy.linalg;
  2. numpy linalg;
  3. scipy sparse linalg.

scipy/sparse/linalg/eigsh can output the k smallest(largest) eigenvalues and eigenvectors; scipy/linalg/eigh also provides the option to select subset of eigenvalues; numpy/linalg/eigvalsh outputs all the eigenvalues. However, none of them seem efficient if I just want a particular eigenvalue.

I run some toy example to compare the time expense on finding the largest eigenvalue. All the methods give close enough solution, eigen decomposition function in numpy.linalg seems the most efficient, thought it requires full spectrum decomposition. Is there any better way to do the job?

Here is the test code and solution

import numpy as np
import scipy.linalg
import scipy.sparse.linalg
import time


def test_scipy_eig(a):
    p = a.shape[0]
    w = scipy.linalg.eigh(a, eigvals=[p-1, p-1], eigvals_only=True)
    return w


def test_scipy_sparse_eig(a):
    p = a.shape[0]
    w = scipy.sparse.linalg.eigsh(a, k=1, which='LA', return_eigenvectors=False)
    return w


def test_numpy_eig(a):
    w = np.linalg.eigvalsh(a)
    return w


p = 2000
a = np.random.normal(0,1,(p,p))
b = a.dot(a.T)

start = time.time()
w1 = test_scipy_eig(b)
t1 = time.time() - start


start = time.time()
w2 = test_numpy_eig(b)
t2 = time.time() - start


start = time.time()
w3 = test_scipy_sparse_eig(b)
t3 = time.time() - start
print "time expense:\n scipy:%f  numpy:%f scipy_sparse:%f " % (t1, t2, t3)

print "largest eigenvalue:\n scipy:%f  numpy:%f scipy_sparse:%f " % (w1[0], w2[-1], w3[0])

Output

time expense:
 scipy:1.427211  numpy:1.395954 scipy_sparse:4.520002 
largest eigenvalue:
 scipy:7949.429984  numpy:7949.429984 scipy_sparse:7949.429984 
1

1 Answers

2
votes

Your toy problem happens to be a difficult case for finding the largest eigenvalue via iterative methods, as you have several eigenvalues clustered around the largest one.

If you replace

a = np.random.normal(0,1,(p,p))

by

a = np.random.rand(p, p)

you get very different performance from the scipy.sparse solver.

The answer you are looking for is: the correct method to use depends on the problem you have at hand, and cannot be determined by using toy examples, unless the eigenvalue and sparsity structure and the size of the toy example resembles the actual problem you are trying to solve.