6
votes

How to define in ISO Prolog a (meta-logical) predicate for the intersection of two lists of variables that runs in linear time? The variables may appear in any determined order. No implementation dependent property like the "age" of variables must influence the outcome.

In analogy to library(ordsets), let's call the relation varset_intersection(As, Bs, As_cap_Bs).

?- varset_intersection([A,B], [C,D], []).
true.

?-varset_intersection([A,B], [B,A], []).
false.

?- varset_intersection([A,B,C], [C,A,D], Inter).
Inter = [A,C].
or
Inter = [C,A].

?- varset_intersection([A,B],[A,B],[A,C]).
B = C
or
A = B, A = C

?- varset_intersection([A,B,C],[A,B],[A,C]).
idem

That is, the third argument is an output argument, that unifies with the intersection of the first two arguments.

See this list of the built-ins from the current ISO standard (ISO/IEC 13211-1:1995 including Cor.2).

(Note, that I did answer this question in the course of another one several years ago. However, it remains hidden and invisible to Google.)

4
The answer to the first query (varset_intersection([A,B], [B,A], []).) is false, right?Tudor Berariu
What should be the result of the following queries: varset_intersection([A,B],[A,B],[A,C]). and varset_intersection([A,B,C],[A,B],[A,C]). Should these goals be satisfied by unifying the real intersection with the third argument?Tudor Berariu
It should be clear now.false
One thing I need to further understand: by putting that or between different results you mean that the goal should be satisfied only once with any of those two results?Tudor Berariu
I give up... how to do in O(|L1|+|L2|) ? waiting for some clever 'trick'...CapelliC

4 Answers

3
votes

If term_variables/2 works in a time linear with the size of its first argument, then this might work:

varset_intersection(As, Bs, As_cap_Bs):-
    term_variables([As, Bs], As_and_Bs),
    term_variables(As, SetAs),
    append(SetAs, OnlyBs, As_and_Bs),
    term_variables([OnlyBs, Bs], SetBs),
    append(OnlyBs, As_cap_Bs, SetBs).

Each common variable appears only once in the result list no matter how many times it appears in the two given lists.

?- varset_intersection2([A,_C,A,A,A], [A,_B,A,A,A], L).
L = [A].

Also, it might give strange results as in this case:

?- varset_intersection([A,_X,B,C], [B,C,_Y,A], [C, A, B]).
A = B, B = C.

(permutation/2 might help here).

2
votes

If copy_term/2 uses linear time, I believe the following works:

varset_intersection(As, Bs, Cs) :-
    copy_term(As-Bs, CopyAs-CopyBs),
    ground_list(CopyAs),
    select_grounded(CopyBs, Bs, Cs).

ground_list([]).
ground_list([a|Xs]) :-
    ground_list(Xs).

select_grounded([], [], []).
select_grounded([X|Xs], [_|Bs], Cs) :-
    var(X),
    !,
    select_grounded(Xs, Bs, Cs).
select_grounded([_|Xs], [B|Bs], [B|Cs]) :-
    select_grounded(Xs, Bs, Cs).

The idea is to copy both lists in one call to copy_term/2 to preserve shared variables between them, then ground the variables of the first copy, then pick out the original variables of the second list corresponding to the grounded positions of the second copy.

1
votes

If unify_with_occurs_check(Var, ListOfVars) fails or succeeds in constant time, this implementation should yield answers in linear time:

filter_vars([], _, Acc, Acc).
filter_vars([A|As], Bs, Acc, As_cap_Bs):-
    (
        \+ unify_with_occurs_check(A, Bs)
      ->
        filter_vars(As, Bs, [A|Acc], As_cap_Bs)
      ;
        filter_vars(As, Bs, Acc, As_cap_Bs)
    ).

varset_intersection(As, Bs, As_cap_Bs):-
    filter_vars(As, Bs, [], Inter),
    permutation(Inter, As_cap_Bs).

This implementation has problems when given lists contain duplicates:

?- varset_intersection1([A,A,A,A,B], [B,A], L).
L = [B, A, A, A, A] ;

?- varset_intersection1([B,A], [A,A,A,A,B], L).
L = [A, B] ;

Edited : changed bagof/3 to a manually written filter thanks to observation by @false in comments below.

0
votes

A possible solution is to use a Bloom filter. In case of collision, the result might be wrong, but functions with better collision resistance exist. Here is an implementation that uses a single hash function.

sum_codes([], _, Sum, Sum).
sum_codes([Head|Tail], K, Acc,Sum):-
    Acc1 is Head * (256 ** K) + Acc,
    K1 is (K + 1) mod 4,
    sum_codes(Tail, K1, Acc1, Sum).

hash_func(Var, HashValue):-
    with_output_to(atom(A), write(Var)),
    atom_codes(A, Codes),
    sum_codes(Codes, 0, 0, Sum),
    HashValue is Sum mod 1024.

add_to_bitarray(Var, BAIn, BAOut):-
    hash_func(Var, HashValue),
    BAOut is BAIn \/ (1 << HashValue).

bitarray_contains(BA, Var):-
    hash_func(Var, HashValue),
    R is BA /\ (1 << HashValue),
    R > 0.

varset_intersection(As, Bs, As_cap_Bs):-
    foldl(add_to_bitarray, As, 0, BA),
    include(bitarray_contains(BA), Bs, As_cap_Bs).

I know that foldl/4 and include/3 are not ISO, but their implementation is easy.