1
votes

I am trying to reproduce a simple example of using kernel PCA. The objective is to separate out the points from two concentric circles.

Creating the data:

circle <- data.frame(radius = rep(c(0, 1), 500) + rnorm(1000, sd = 0.05),
                     phi = runif(1000, 0, 2 * pi),
                     group = rep(c("A", "B"), 500))
#
circle <- transform(circle,
                    x = radius * cos(phi),
                    y = radius * sin(phi),
                    z = rnorm(length(radius))) %>% select(group, x, y, z)

TFRAC = 0.75
#
train <- sample(1:1000, TFRAC * 1000)

circle.train <- circle[train,]
circle.test <- circle[-train,]

> head(circle.train)
    group         x          y        z
491     A -0.034216 -0.0312062  0.70780
389     A  0.052616  0.0059919  1.05942
178     B -0.987276 -0.3322542  0.75297
472     B -0.808646  0.3962935 -0.17829
473     A -0.032227  0.0027470  0.66955
346     B  0.894957  0.3381633  1.29191

I have split the data up into training and testing sets because I have the intention (once I get this working!) of testing the resulting model.

enter image description here

In principal kernel PCA should allow me to separate out the two classes. Other discussions of this example have used the Radial Basis Function (RBF) kernel, so I adopted this too. In R kernel PCA is implemented in the kernlab package.

library(kernlab)

circle.kpca <- kpca(~ ., data = circle.train[, -1], kernel = "rbfdot", kpar = list(sigma = 10), features = 1)

I requested only the first component and specified the RBF kernel. This is the result:

enter image description here

There has definitely been a major transformation of the data, but the transformed data is not what I was expecting (which would be a nice, clean separation of the two classes). I have tried fiddling with the value of the parameter sigma and, although the results do vary dramatically, I still didn't get what I was expecting. I assume that sigma is related to the parameter gamma mentioned here, possibly via the relationship given here (without the negative sign?).

I'm pretty sure that I am making a naive rookie error here and I would really appreciate any pointers which would get me onto the right track.

Thanks, Andrew.

1

1 Answers

1
votes

Try sigma = 20. I think you will get the answer you are looking for. The sigma in kernlab is actually what is usually referred to as gamma for rbf kernel so they are inversely related.