0
votes

I am applying the functions from the flexclust package for hard competitive learning clustering, and I am having trouble with the convergence.

I am using this algorithm because I was looking for a method to perform a weighed clustering, giving different weights to groups of variables. I chose hard competitive learning based on a response for a previous question (Weighted Kmeans R).

I am trying to find the optimal number of clusters, and to do so I am using the function stepFlexclust with the following code:

new("flexclustControl") ## check the default values

fc_control <- new("flexclustControl")
[email protected] <- 500 ### 500 iterations
fc_control@verbose <- 1 # this will set the verbose to TRUE
fc_control@tolerance <- 0.01

### I want to give more weight to the first 24 variables of the dataframe
my_weights <-  rep(c(1, 0.064), c(24, 31)) 

set.seed(1908)
hardcl <- stepFlexclust(x=df, k=c(7:20), nrep=100, verbose=TRUE, 
              FUN = cclust, dist = "euclidean", method = "hardcl", weights=my_weights, #Parameters for hard competitive learning
              control = fc_control,
              multicore=TRUE)

However, the algorithm does not converge, even with 500 iterations. I would appreciate any suggestion. Should I increase the number of iterations? Is this an indicator that something else is not going well, or did I a mistake with the R commands?

Thanks in advance.

1

1 Answers

0
votes

Two things that answer my question (as well as a comment on weighted variables for kmeans, or better said, with hard competitive learning):

  • The weights are for observations (=rows of x), not variables (=columns of x). so using hardcl for weighting variables is wrong.

  • In hardcl or neural gas you need much more iterations compared to standard k-means: In k-means one iteration uses the complete data set to change the centroids, hard competitive learning and uses only a single observation. In comparison to k-means multiply the number of iterations by your sample size.