1
votes

I have a question concerning unseen samples which I want to qualify (face or not for). Using the ordinary Eigenface method (that is not reproducing kernel substituting the inner product of the PCA), the evaluation is done by projecting the sample onto the Eigenvectors from PCA on the trainset matrix and finally testing the minimal distance of the projection to the eigenvectors against a threshold.

I scrawled through several publications discussing the KPCA approach, but when it comes to the final step of testing unseen samples,I ran into a tiny, unanswered problem:

Using the ordinary PCA, the mean of the training set is substracted from the testvector before projection onto the Eigenvectors. Not so for the KPCA. I guess the problem here is that do not have acces to points in the kernel space, just to distances. Hence, we have no "mean". However, isn't this at least worth discussing?

Thanks for opinions and suggestions, as I think this is some kind of inaccuracy not mentioned so far.

1

1 Answers

0
votes

In the early KPCA work by Scholkopf (there is a review paper with him as author with a title involving "nonlinear component analysis"), there was a KPCA variant that involved subtracting the mean. It makes things more computationally complex and follow-up algorithms with centering (like the estimating the inverse, or out-of-sample estimation) also get more complex. Here complexity has more to do with computation, more complex = most costly.

Around the same time, work by Williams and Seeger (2001) showed that KPCA could be derived as a finite dimensional approximation to a density function (under the Gaussian kernel or other Parzen window type functions). This interpretation does not require a mean offset. For these, there is also the nice interpretation that KPCA maps the data onto a hyper-sphere. A mean offset would destroy that interpretation. How it generalizes to non-traditional kernels that are not so related to densities, I am not sure. This density-based interpretation does have nice connections to orthogonal series density estimation and also kernel entropy analysis (which provide ways to tweak KPCA procedure).