1
votes

I want to perform the K-means analysis where some of my variables should be considered more important than the others. I have found the kmeansw function, but after reading its help I am a bit confused:

Usage

kmeansW(x, centers, weight = rep(1,nrow(x)), iter.max = 10, nstart = 1) Arguments

x

A numeric vector, matrix or data frame.

centers

Either the number of clusters or a set of initial (distinct) cluster centres. If a number, a random set of (distinct) rows in x is chosen as the initial centres.

weight

weight of the elements of x. by default the same.

iter.max

The maximum number of iterations allowed.

nstart

If centers is a number, how many random sets should be chosen?

Do I understand it right that it weighs elements of data frame (rows) and not variables (columns)? If this is right, what other way would you recommend for this problem?

1
Why not just transpose your matrix?Konrad Rudolph
you can just duplicate as many times as needed the variables you want to increase weightagenis
The function should weight variables differently. The k-means algorithm relies on euclidean distances between samples; the weighting should reflect how strong each variable value is considered when computing pairwise distances.PejoPhylo
@PrzeM, you should also be able to simply multiply each vector of interest by a constant number (corresponding to the weight). With this modified matrix, you can run a standard k-means algorithm.PejoPhylo
Professor who is supervising my project told me to multiply the scaled values by their weights, not weights' square root like I have proposed to him. I am just going to stick to his decision as I still have some other problems to solve and very little time left to complete it.PrzeM

1 Answers

0
votes

You can simply multiply your features by your desired weights "after scaling the data". This increases the euclidean distances of weighted features between your observations and consequently fulfills your requirement.