Clustering vectors with similar patterns

Question

Say that I have many vectors, some of them are:

a: [1,2,3,4,3,2,1,0,0,0,0,0]
b: [5,5,5,5,5,10,20,30,5,10]
c: [1,2,3,2,1,0,0,0,0,0,0,0]

We can see similar patterns between vector a and c. My question is if it is possible to classify these two to the same cluster and classify b to another cluster. I rather not use algorithms like KMeans, because the values are not interesting, only the patterns do. any advice is welcome, especially solutions in Phyton. Thanks

Aramakus Aramakus · Accepted Answer · 2020-06-08T11:56:10

You may want to use Support Vector Classifier as it produces boundaries between clusters based on the patterns (generalized directions) between points in the clusters, rather than naive distance between points (like KMeans and Spectral Clustering will do). You will however have to construct labels Y yourself as SVC is a supervised method. Here is an example:

import numpy as np
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

a = [1,2,3,4,3,2,1,0,0,0,0,0]
b = [5,5,5,5,5,10,20,30,5,10]
c = [1,2,3,2,1,0,0,0,0,0,0,0]

d = [100,2,300,4,100,0,0,0,0,0,0,0]

vectors = [a, b, c]

# Vectors have different lengths. Append them to get equal dimensions.
L = max(len(elem) for elem in vectors)
imputed = []
for elem in vectors:
  l = len(elem)
  imputed.append(elem + [0]*(L-l))

print(imputed)

X = np.array(imputed)
print(X)
Y = np.array([0, 1, 0])

clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
clf.fit(X, Y)

print(clf.predict(np.array([d])))

Clustering vectors with similar patterns

1 Answers