Implementing a simple clustering algorithm in Prolog

Question

My goal is implementing a simple clustering algorithm in Prolog.

Assume you have a certain amount of locations. Each position has a certain ID and they look like 'position(X,Y,ID)'.

I had the following idea: Basically you take the first ID and its location and you add this ID to a list. You then compare the location of this ID to all other possible locations. Each ID with a location that is within a certain distance to the original location is then also added to the list. The next step is to do the exact same thing for all new IDs that have been added to the list. And then you do the same thing for IDs that have been added in the previous step, et cetera.

After there are no more possibilities you have a list of IDs that form a cluster. For example 1, 2, 3, 6 and 8.

For the first cluster you started at ID 1. Normally you'd then start at ID 2 and check in which cluster that one is. But it already is in the first cluster, there is no reason to look at this one again. Just like we don't have to look at the values 3, 6 and 8 either. Because we simply know that looking at these IDs would give the exact same cluster.

So how I imagine it is that there is a predicate that returns a list of lists, basically all clusters. This predicate uses another one that actually generates single clusters.

I've already tried some things but it didn't really work.

A possible list of locations:

position(1,1,1)
position(3,2,2)
position(2,4,3)
position(4,6,4)
position(5,4,5)
position(10,3,6)
position(12,2,7)
position(13,6,8)
position(16,2,9)

Assuming a distance of 3, there should be four clusters. 1, 2, 3, 4 and 5; 6 and 7; 8; 9.

First a simple predicate which checks whether two IDs are within a certain distance of each other:

checkDistance(ID1,ID2,Distance) :-
    position(X1,Y1,ID1),
    position(X2,Y2,ID2),
    abs(X2 - X1) =< Distance,
    abs(Y2 - Y1) =< Distance.

The following code creates a list of IDs close to the initial ID:

compareAllPos(ID,Val,[],Distance) :- 
    \+ checkDistance(ID,Val,Distance).

compareAllPos(ID,Val,[Val|LS],Distance) :-
    checkDistance(ID,Val,Distance),
    Val2 is Val+1,
    compareAllPos(ID,Val2,LS,Distance).

It gives the following output:

?- compareAllPos(1,1,List,3).
List = [1, 2, 3] ;

This only solves the first part. Now I have to go through each value that is added to the list. I did that by adding another compareAllPos:

compareAllPos(ID,Val,[Val|LS],Distance) :-
    checkDistance(ID,Val,Distance),
    Val2 is Val+1,
    append(LS1,LS2,LS),
    compareAllPos(Val,1,LS1,Distance),
    compareAllPos(ID,Val2,LS2,Distance).

The next problem is the program is now stuck in an infinite loop, so a check is needed to make sure that the ID hasn't been seen yet.

I had the following idea:

compareAllPos2(ID,Val,_,[],Distance) :- 
    \+ checkDistance(ID,Val,Distance).

compareAllPos2(ID,_,Visited,[],_) :-
    member(ID,Visited).

compareAllPos2(ID,Val,Visited,[Val|LS],Distance) :-
    checkDistance(ID,Val,Distance),
    Val2 is Val+1,
    append(LS1,LS2,LS),
    append(ID,Visited,Visited1),
    compareAllPos2(Val,1,Visited1,LS1,Distance),
    append([],Visited1,Visited2),
    compareAllPos2(ID,Val2,Visited2,LS2,Distance).

I would use the following line to call it:

compareAllPos2(1,1,[],List,3).

This is where I get stuck. The above code does not work correctly and I have not yet figured out the right way to solve this. If that code works it should return a single cluster based on an ID. The last step would then be to look at other possible clusters.

CapelliC CapelliC · Accepted Answer · 2013-04-16T10:09:44

after you add the last compareAllPos/4, you'll get

?- compareAllPos(1,1,List,3).
List = [1, 2, 3] ;
List = [1, 2, 3, 1, 2, 3, 4, 5] ;
List = [1, 2, 3, 1, 2, 3, 4, 5] ;
List = [1, 2, 3, 1, 2, 3, 4, 5] ;
List = [1, 2, 3, 1, 2, 3, 4, 5] ;
List = [1, 2, 3, 1, 2, 1, 2, 3, 3|...] ;
List = [1, 2, 3, 1, 2, 1, 2, 3, 3|...] ;
...

I think could be the cause of looping, and you should keep it simpler. I would write

checkDistance(ID1,ID2,Distance) :-
    position(X1,Y1,ID1),
    position(X2,Y2,ID2),
    ID1 \= ID2,
    abs(X2 - X1) =< Distance,
    abs(Y2 - Y1) =< Distance.

clusters(Distance, Cs) :-
    once(position(_,_,First)),
    clusters(Distance, First, [], Cs).

clusters(Distance, Pivot, SoFar, Cs) :-
    findall(Tentative,
        (   pos_not_clustered(SoFar, Tentative),
            checkDistance(Pivot, Tentative, Distance)
        ),
        Cluster),
    Updated = [[Pivot|Cluster]|SoFar],
    (   pos_not_clustered(Updated, NextPivot)
    ->  clusters(Distance, NextPivot, Updated, Cs)
    ;   Cs = Updated
    ).

pos_not_clustered(Clusters, ID) :-
    position(_,_,ID),
    maplist(not_member(ID), Clusters).

not_member(E, L) :- \+ memberchk(E, L).

since it yields

?- clusters(3,C).
C = [[9], [6, 7, 8], [4, 5], [1, 2, 3]].

edit sorry for the buggy code, I've changed to this

clusters(Distance, Cs) :-
    once(position(_,_,First)),
    clusters(Distance, [[First]], Cs).

clusters(Distance, SoFar, Cs) :-
    (   pos_not_clustered(SoFar, Tentative),
        add_to_cluster(SoFar, Tentative, Distance, Updated)
    ->  clusters(Distance, Updated, Cs)
    ;   Cs = SoFar
    ).

pos_not_clustered(Clusters, ID) :-
    position(_,_,ID),
    maplist(not_member(ID), Clusters).

not_member(E, L) :- \+ memberchk(E, L).

add_to_cluster([C|Cs], Tentative, Distance, Updated) :-
    member(Id, C),
    checkDistance(Id, Tentative, Distance), !,
    Updated = [[Tentative|C]|Cs].
add_to_cluster([C|Cs], Tentative, Distance, [C|Rs]) :-
    add_to_cluster(Cs, Tentative, Distance, Rs).
add_to_cluster([], Tentative, _Distance, [[Tentative]]).

now I get

?- clusters(3,C).
C = [[5, 4, 3, 2, 1], [8, 7, 6], [9]].

Implementing a simple clustering algorithm in Prolog

1 Answers