I am using pre-trained fastText (https://fasttext.cc/) vectors to perform clustering on short chat messages. This means that the resulting vector will be an average of the tokens composing the message.
I started using k-means initially but I am now wondering whether it is the right choice. For instance, K-means uses the Euclidean distance while in most cases, word embedding similarity is computed using cosine similarity.
How to choose the right clustering method in this case?