6
votes

I am writing code to compute Classical Multidimensional Scaling (abbreviated to MDS) of a very large n by n matrix, n = 500,000 in my example.

In one step of MDS, I need to compute the highest three eigenvalues and their corresponding eigenvectors of a n by n matrix. This matrix is called the B matrix. I only need these three eigenvectors and eigenvalues. Common methods of calculating eigenvectors and eigenvalues of a large matrix take a long time, and I do not require a very accurate answer, so I am seeking an estimation of the eigenvectors and eigenvalues.

Some parameters:

  1. The B matrix is symmetric, real, and quite dense
  2. The eigenvalue decomposition of B in theory should always produce real numbers.
  3. I do not require an entirely precise estimation, just a fast one. I would need it to complete in several hours.
  4. I write in python and C++

My question: Are there fast methods of estimating the three highest eigenvectors and eigenvalues of such a large B matrix?

My progress: I have found a method of approximating the highest eigenvalue of a matrix, but I do not know if I can generalize it to the highest three. I have also found this paper written in 1996, but it is extremely technical and hard for me to read.

3
A matrix that size would require more than a terabyte of storage given 64-bit floating-point entries. Forget eigenvectors -- even doing a single matrix-vector multiplication looks painful.David Eisenstat
But there is no need to store the original matrix! It is indirectly given in the MDS algorithm and you can use that to perform matrix-vector multiplication without first computing the matrix.Hans Olsson
Have you looked at approximate MDS meant for big data? E.g. see pike.cs.ucla.edu/~weiwang/paper/CIMCV06.pdfGene

3 Answers

8
votes

G. Golub and C.F Van Loan Matrix Computations 2nd in chapter 9 state that Lanczos algorithms are one choice for this (except that the matrix should ideally be sparse - it clearly works for non-sparse ones as well)

https://en.wikipedia.org/wiki/Lanczos_algorithm

2
votes

You can get the highest eigenvector of B and then, transform the data into B' using that eigenvector. Then pop the first column of B' and get B'' so you can get the highest eigenvector of B'': it is enough information to compose a plausible second highest eigenvector for B. And then for the third.

About speed: you can randomly sample that huge dataset to be only a dataset of N items. If you are getting only three dimensions, I hope you can also get rid of most of the data to get an overview of the eigenvectors. You can call it: 'electoral poll'. I cannot help you in measuring the error rate, but I will try sampling 1k items, several times, and seeing if results are more or less the same.

Now you can get the mean of several 'polls' to build a 'prediction'.

0
votes

Have a look at suggestions in this thread

Largest eigenvalues (and corresponding eigenvectors) in C++

As suggested there you can use ARPACK package which has a C++ interface.