How to remove duplicates from a list in SWI-Prolog?

9

votes

I need to write a predicate remove_duplicates/2 that removes duplicate elements form a given list. For example:

?- remove_duplicates([a,a,b,c,c], List). List = [a,b,c] Yes

Please keep in mind I'm only learning SWI-Prolog for two days and only understand the basics of Prolog. This is what I have at the moment:

remove_duplicates([H | T], List) :- member(H, T), append(T, [], List1).

This works for the list [a,a,b,c] but not for lists where two elements in the tail are the same. I figured I somehow have to remove the Head to a temporary list, create a new Head and just repeat the predicate. I have no idea how to do that. Also, when the Head is not in the Tail, for example with lists like [a,b,b,c], the terminal just says False, because member(H, T) is not true.

Any ideas?

prolog

Possible duplicate of Remove duplicates in list (Prolog) – lambda.xy.x

@lambda.xy.x There are many similar question and answers on the [prolog] tag, however, I am not sure if the proposed (and accepted) solutions are necessarily that great. I don't claim my answer below is any better, but at least it shows a different approach. – user1812457

You are right, there are a lot of impure solutions to this problem on SO. I picked the question where at least one solution (the one of @repeat ) worked for non-ground terms. I consider your solution better suited to beginners though because it doesn't rely on reification. – lambda.xy.x

1

votes

A simple and nice code to remove duplicates would be:

remove_duplicates([], []).

remove_duplicates([Head | Tail], Result) :-
    member(Head, Tail), !,
    remove_duplicates(Tail, Result).

remove_duplicates([Head | Tail], [Head | Result]) :-
    remove_duplicates(Tail, Result).

As described in Introduction to Prolog by Ulle Endriss

11

votes

To remove duplicates, if the order doesn't matter, the easiest is to use sort/2:

?- sort([a,a,b,b,c], X).
X = [a, b, c].

?- sort([c,c,a,a,b], X).
X = [a, b, c].

Of course, you see that the original order of the elements is lost. More importantly, this only guarantees to be correct if the list you are sorting is already ground (no free variables in it). Consider this small example:

?- sort([X,Y], [a,a]).
X = Y, Y = a.

Doesn't feel exactly right if your goal is to remove duplicates....

So you might as well have written:

must_be(ground, List), sort(List, Unique).

You could also do it yourself, and keep the original order. But then, you need to keep record of the elements you have seen so far. For example, you can keep them in an extra list:

The list of element seen so far is empty at the beginning

list_unique(List, Unique) :-
    list_unique_1(List, [], Us).

list_unique_1([], _, []).
list_unique_1([X|Xs], So_far, Us) :-
    list_unique_2(X, Xs, So_far, Us).

If all the elements seen so far are different from X, put it in the list of unique elements and add it to the list of elements seen so far.

list_unique_2(X, Xs, So_far, [X|Us]) :-
    maplist(dif(X), So_far),
    list_unique_1(Xs, [X|So_far], Us).

If the element has been seen so far, just skip it.

list_unique_2(X, Xs, So_far, Us) :-
    memberchk(X, So_far),
    list_unique_1(Xs, So_far, Us).

This is the most straight-forward way to do it. There are other cleverer ways to do it that might have a better complexity, but this is the easiest to write.

5

votes

The are a few problems with your predicate:

remove_duplicates([H | T], List) :-
  member(H, T),
  append(T, [], List1).

First you should have a recursive predicate ,you check for the head of the list but you don't recursively check for the tail of the list:

so you could change it to:

remove_duplicates([H | T], List) :- 
      member(H, T),
      remove_duplicates( T, List).

The above calls recursively for the tail.

Second you have to decide what happens if h is not member of the list so you need to add another clause :

remove_duplicates([H | T], [H|T1]) :- 
       \+member(H, T), 
       remove_duplicates( T, T1).

The above checks if H exists in T and if not forces the first element of the list you want to return to be the element H:

if this is not so clear, the above is equal to this:

  remove_duplicates([H | T], List) :- 
           \+member(H, T), 
           List=[Head|Tail],
           Head=H,
           remove_duplicates( T, Tail).

This is not very elegant ,it is just to understand how the previous works.

Last you need a base to your recursion for the empty list:

remove_duplicates([],[]).

So the predicate remove_duplicates according to the above clauses should be:

remove_duplicates([],[]).

remove_duplicates([H | T], List) :-    
     member(H, T),
     remove_duplicates( T, List).

remove_duplicates([H | T], [H|T1]) :- 
      \+member(H, T),
      remove_duplicates( T, T1).

Though you have to notice that the above would keep only one time every element for example:

remove_duplicates([a,a,b,c],L).
L = [a, b, c] ;
false.

This is ok but check the example below :

remove_duplicates([a,a,b,a,c],L).
L = [b, a, c] ;
L = [b, a, c] ;
false.

This will return L=[b,a,c] because it erased the two first 'a' of the list and kept only the third.

If you wish to erase only duplicates the above will not work due to member predicate. You could write :

remove_duplicates2([],[]).

remove_duplicates2([H],[H]).

remove_duplicates2([H ,H| T], List) :-remove_duplicates( [H|T], List).

remove_duplicates2([H,Y | T], [H|T1]):- Y \= H,remove_duplicates( [Y|T], T1).

The predicate remove_duplicates2 is similar to remove_duplicates but with some important differences:

1)You don't use member, but you examine what happens when you have a list [H,H|T] with two SAME elements and you keep only one, so you ignore the first H and call recursively remove_duplicates2([H|T],L).

2)If you have input [H,Y|T] (a list with at least two elements H,Y and a tail of the list) and H\=Y you call remove_duplicates2([Y|T],T1) and you have stored H to the List you want to return.

3) Finally because all clause require at least two elements your base of recursion is the list with one element : remove_duplicates2([H],[H]).

4) The clause remove_duplicates2([],[]). is only used if the input is the empty list ,so it is required to cover this input.

The remove_duplicates2 predicate would give you :

 remove_duplicates2([a,a,b,a,c],L).
L = [a, b, a, c] ;
false.

remove_duplicates2([a,a,b,c],L).
L = [a, b, c] ;
false.

3

votes

Try this, I hope it will help you:

delete3(_,[],[]).
delete3(X,[X|T],R):- delete3(X,T,R).
delete3(X,[H|T],[H|R]) :- delete3(X,T,R).

remove_duplicates([],[]).
remove_duplicates([H|T], [H|R]) :- 
    member(H,T),!,
    delete3(H,T,R1),
    remove_duplicates(R1,R).

remove_duplicates([H|T],[H|R]):-
    remove_duplicates(T,R).

In the second predicate, we verify if the head variable*(H)* is member in the tail of the list*(T)*. If it is then we call the first predicate which deletes all the variables equal to the head variable that are in the tail and so on.

0

votes

% remdup(List, List_No_duplicates).
remdup([],[]).
remdup([H|T],Lnd):- rd1(H,T,T1), remdup(T1,Tnd),Lnd=[H|Tnd].
% rd1(X,L,Lr) removes all occurance of X in List L and store result in List Lr
rd1(_,[],[]).    
rd1(X,[X|T],Lx):-    rd1(X,T,Lx).
rd1(X,[Y|T],Lx):-    rd1(X,T,Lx1),Lx=[Y|Lx1].

How it work - predicate remdup

Take 1st element H of given List and remove all its occurrences in the Tail part T - store this result in List T1; AND
Call remdup recursively on T1 store result in Tnd; AND
Add back the element H to Tnd which is actually the Tail part T that was modified by the recursive call of remdup and so is already free of all duplicates and does not contain any occurrence of H.

predicate - rd1

Removing all occurrences and anything from a blank list is a blank list
If head matches X - then continues to call again on Tail Part
if head doesnt match X - then continue to tail part; AND add unmatched head Y to the modified tail part, which is now free of all occurrences of X

0

votes

You can use list_to_set/2 or sort/2 as said in the answer here.

How to remove duplicates from a list in SWI-Prolog?

6 Answers