Finding smallest eigenvectors of large sparse matrix, over 100x slower in SciPy than in Octave

Question

I am trying to compute few (5-500) eigenvectors corresponding to the smallest eigenvalues of large symmetric square sparse-matrices (up to 30000x30000) with less than 0.1% of the values being non-zero.

I am currently using scipy.sparse.linalg.eigsh in shift-invert mode (sigma=0.0), which I figured out through various posts on the topic is the prefered solution. However, it takes up to 1h to solve the problem in most cases. On the other hand the function is very fast, if I ask for the largest eigenvalues (sub seconds on my system), which was expected from the documentation.

Since I am more familiar with Matlab from work, I tried solving the problem in Octave, which gave me the same result using eigs (sigma=0) in just a few seconds (sub 10s). Since I want to do a parameter sweep of the algorithm including the eigenvector computation, that kind of time gain would be great to have in python as well.

I first changed the parameters (especially tolerance), but that didn't change much on the timescales.

I am using Anaconda on Windows, but tried to switch the LAPACK/BLAS used by scipy (which was a huge pain) from mkl (default Anaconda) to OpenBlas (used by Octave according to the documentation), but could not see a change in performance.

I was not able to figure out, whether there was something to change about the used ARPACK (and how)?

I uploaded a testcase for the code below to the following dropbox-folder: https://www.dropbox.com/sh/l6aa6izufzyzqr3/AABqij95hZOvRpnnjRaETQmka?dl=0

In Python

import numpy as np
from scipy.sparse import csr_matrix, csc_matrix, linalg, load_npz   
M = load_npz('M.npz')
evals, evecs = linalg.eigsh(M,k=6,sigma=0.0)

In Octave:

M=dlmread('M.txt');
M=spconvert(M);
[evecs,evals] = eigs(M,6,0);

Any help is appriciated!

Some additional options I tried out based on the comments and suggestions:

Octave: eigs(M,6,0) and eigs(M,6,'sm') give me the same result:

[1.8725e-05 1.0189e-05 7.5622e-06 7.5420e-07 -1.2239e-18 -2.5674e-16]

while eigs(M,6,'sa',struct('tol',2)) converges to

[1.0423 2.7604 6.1548 11.1310 18.0207 25.3933]

much faster, but only if the tolerance values is above 2, otherwise it does not converge at all and the values are strongly different.

Python: eigsh(M,k=6,which='SA') and eigsh(M,k=6,which='SM') both do not converge (ARPACK error on no convergence reached). Only eigsh(M,k=6,sigma=0.0) gives some eigenvalues (after almost an hour), which are different to octave for the smallest ones (even 1 additional small value is found):

[3.82923317e-17 3.32269886e-16 2.78039665e-10 7.54202273e-07 7.56251500e-06 1.01893934e-05]

If the tolerance is high enough I also get results from eigsh(M,k=6,which='SA',tol='1'), which come close the other obtained values

[4.28732218e-14 7.54194948e-07 7.56220703e-06 1.01889544e-05, 1.87247350e-05 2.02652719e-05]

again with a different number of small eigenvalues. The computation time is still almost 30min. While the different very small values might be understandable, since they might represent multiples of 0, the different multiplicity baffles me.

Additionally, there seems to be some fundamental differences in SciPy and Octave, which I cannot figure out, yet.

1 - I assume you meant to put brackets around [evals,evecs] in the octave code? 2 - can you include a small example for M? or maybe a generator script for one if that's possible? — Nick J
1 - Yes exactly, I edited my post. 2 - I tried out the performance for some sub-matrices of my data and it seems that Octave is always faster, but for smaller matrices below 5000x5000 its just a factor of 2-5 times, above that it gets really ugly. And since its "real data" I can not give a generator script. Is there a standard way to upload an example somehow? Due to the sparsity, a npz-file is reasonably small. — Spacekiller23
Thx. I included a dropbox link in the original post and updated the code to a working example. — Spacekiller23
Juts to reinforce your point, I tested on Matlab R2019b and got 84 seconds vs 36.5 mins in Python 3.7, Scipy 1.2.1 (26 times faster). — Bill

Anthony Gatti Anthony Gatti · Accepted Answer · 2020-04-17T03:20:15

I know this is old now, but I had the same problem. Did you review here (https://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html)?

It seems like when you set sigma to a low number (0) you should set which='LM', even though you are wanting to low values. This is because setting sigma transforms the values you want (low in this case) to appear to be high and so you still are able to take advantage of the 'LM' methods, which are much faster to get what you want (the low eigenvalues).

Finding smallest eigenvectors of large sparse matrix, over 100x slower in SciPy than in Octave

3 Answers

February 2020, TL;DR