2
votes

I am trying to use k-medoids to cluster some trajectory data I am working with (multiple points along the trajectory of an aircraft). I want to cluster these into a set number of clusters (as I know how many types of paths there should be).

I have found that k-medoids is implemented inside the pyclustering package, and am trying to use that. I am technically able to get it to cluster, but I do not know how to control the number of clusters. I originally thought it was directly tied to the number of elements inside what I called initial_medoids, but experimentation shows that it is more complicated than this. My relevant code snippet is below.

Note that D holds a list of lists. Each list corresponds to a single trajectory.

def hausdorff( u, v):
    d = max(directed_hausdorff(u, v)[0], directed_hausdorff(v, u)[0])
    return d

traj_count = len(traj_lst)
D = np.zeros((traj_count, traj_count))

for i in range(traj_count):
    for j in range(i + 1, traj_count):
        distance = hausdorff(traj_lst[i], traj_lst[j])
        D[i, j] = distance
        D[j, i] = distance


from pyclustering.cluster.kmedoids import kmedoids
initial_medoids = [104, 345, 123, 1]

kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()[0]

num_clusters = len(np.unique(cluster_lst))
print('There were %i clusters found' %num_clusters)

I have a total of 1900 trajectories, and the above-code finds 1424 clusters. I had expected that I could control the number of clusters through the length of initial_medoids, as I did not see any option to input the number of clusters into the program, but this seems unrelated. Could anyone guide me as to the mistake I am making? How do I choose the number of clusters?

2
What is the result that you get?Has QUIT--Anony-Mousse
I'm sorry I didn't mention that previously. I've now edited the post to include that information. I have a total of 1900 trajectories, and the above-code finds 1424 clusters. I can change the number of clusters it finds somewhat by changing the initial_medoids, but cannot understand how to get a reasonable number of clusters (like 10).Chiron
Maybe pyclustering is broken. Have you tried other tools such as ELKI, R, ...?Has QUIT--Anony-Mousse
Thank you. I have not tried any other tools. I need to incorporate this into a Python program. Do you know of other Python packages which may be able to be imported. I'm not familiar with too many other languages at this time.Chiron
Maybe you are also misinterpreting the returned value.Has QUIT--Anony-Mousse

2 Answers

1
votes

In case of requirement to obtain clusters you need to call get_clusters():

cluster_lst = kmedoids_instance.get_clusters()

Not get_clusters()[0] (in this case it is a list of object indexes in the first cluster):

cluster_lst = kmedoids_instance.get_clusters()[0]

And that is correct, you can control amount of clusters by initial_medoids.

1
votes

It is true you can control the number of cluster, which correspond to the length of initial_medoids.

The documentation is not clear about this. The get__clusters function "Returns list of medoids of allocated clusters represented by indexes from the input data". so, this function does not return the cluster labels. It returns the index of rows in your original (input) data.

Please check the shape of cluster_lst in your example, using .get_clusters() and not .get_clusters()[0] as annoviko suggested. In your case, this shape should be (4,). So, you have a list of four elements (clusters), each containing the index or rows in your original data.

To get, for example, data from the first cluster, use:

kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()
traj_lst_first_cluster = traj_lst[cluster_lst[0]]