Use `purrr::map` with k-means

Question

I thought that this

 kmeans(x = matrix(1:50, 5), centers = 2, iter.max = 10)

Could be written as:

matrix(1:50, 5) %>% 
map( ~kmeans(x = .x, centers = 2, iter.max = 10))

Error in sample.int(m, k) : 
  cannot take a sample larger than the population when 'replace = FALSE'

But the second does not work. How do I use kmeans in combination with purrr::map()?

Why do you need map here? matrix(1:50, 5) %>% kmeans(., centers = 2, iter.max = 10). A matrix is a vector with dim attributes. When you do map, it goes through each single observation. — akrun
@akrun because in my original example I have several matrixes (scaled, with/without certain variables,etc.), and I would like to compare the results of the clustering against each other. — Dambo
Not sure I get it. If you have several matrices in a list, then map can be applied — akrun
Your approach would work if it is in a list i.e. list(matrix(1:50, 5), matrix(51:100, 5)) %>% map( ~kmeans(x = .x, centers = 2, iter.max = 10)) — akrun

akrun akrun · Accepted Answer · 2017-10-19T19:54:21

The matrix, by itself is a vector with dim attributes. So, when we directly apply map on the matrix, it goes through the each of the individual elements. Instead, place it in a list

list(matrix(1:50, 5) ) %>% 
         map( ~kmeans(x = .x, centers = 2, iter.max = 10))

Note that for a single matrix, we don't need map

 matrix(1:50, 5) %>% 
      kmeans(., centers = 2, iter.max = 10)

It becomes useful when we have a list of matrices

list(matrix(1:50, 5), matrix(51:100, 5)) %>% 
            map( ~kmeans(x = .x, centers = 2, iter.max = 10))

Use `purrr::map` with k-means

1 Answers