K-means is well defined only for Euclidean spaces, where distance between vector A and B is expressed as
|| A - B || = sqrt( SUM_i (A_i - B_i)^2 )
thus if you want to "weight" particular feature, you would like something like
|| A - B ||_W = sqrt( SUM_i w_i(A_i - B_i)^2 )
which would result in feature i being much more important (if w_i>1) - thus you would get a bigger penalty for having different value (in terms of bag of words/set of words - it simply means that if two documents have different amount of this particular word, they are assumed to be much more different than the ones which differ on another set of words).
So how can you enforce it? Well, basic maths is all you need! You can easily see that
|| A - B ||_W = || sqrt(W)*A - sqrt(W)*B ||
in other words - you take out your tfidf transformer (or whatever you use to map your text to constant size vector), check which features are responsible for words you are interested in, you create a vector of ones (with size equal to number of dimensions) and increase values for the words you care about (for example 10x) and take square root of this thing. Then, you simply preprocess all your data by multipling "point-wise" with broadcasting (np.multiply
) by this weight vector. That's all you need, now your words will be more important in this well defined manner. From mathematical perspective this is introducing Mahalanobis distance instead of Euclidean, with the covariance matrix equal to w*I (thus - diagonal Gaussian used as a generator of your norm).