I am doing a single document clustering with K Means, I am now working on preparing the data to be clustered and represent N sentences in their vector representations.
However, if I understand correctly, KMeans algorithm is set to create k clusters based on the euclidean distance to k center points. Regardless of the sentences order.
My problem is that I want to keep the order of the sentences and consider them in the clustering task.
Let say S = {1...n}
a set of n vectors representing sentences, S_1 = sentence 1 , S_2 = sentence 2 .. etc
.
I want that the clusters will be K_1 = S[1..i], K_2 = S[i..j] etc..
I thought maybe transform this into 1D and sum the index of each sentence to the transformed value. But not sure if it will help. And maybe there's a smarter way.