9
votes

This is a question provoked by an already deleted answer to this question. The issue could be summarized as follows:

Is it possible to fold over a list, with the tail of the list generated while folding?

Here is what I mean. Say I want to calculate the factorial (this is a silly example but it is just for demonstration), and decide to do it like this:

fac_a(N, F) :-
        must_be(nonneg, N),
        (       N =< 1
        ->      F = 1
        ;       numlist(2, N, [H|T]),
                foldl(multiplication, T, H, F)
        ).

multiplication(X, Y, Z) :-
        Z is Y * X.

Here, I need to generate the list that I give to foldl. However, I could do the same in constant memory (without generating the list and without using foldl):

fac_b(N, F) :-
        must_be(nonneg, N),
        (       N =< 1
        ->      F = 1
        ;       fac_b_1(2, N, 2, F)
        ).

fac_b_1(X, N, Acc, F) :-
        (       X < N
        ->      succ(X, X1),
                Acc1 is X1 * Acc,
                fac_b_1(X1, N, Acc1, F)
        ;       Acc = F
        ).

The point here is that unlike the solution that uses foldl, this uses constant memory: no need for generating a list with all values!

Calculating a factorial is not the best example, but it is easier to follow for the stupidity that comes next.

Let's say that I am really afraid of loops (and recursion), and insist on calculating the factorial using a fold. I still would need a list, though. So here is what I might try:

fac_c(N, F) :-
        must_be(nonneg, N),
        (       N =< 1
        ->      F = 1
        ;       foldl(fac_foldl(N), [2|Back], 2-Back, F-[])
        ).

fac_foldl(N, X, Acc-Back, F-Rest) :-
        (       X < N
        ->      succ(X, X1),
                F is Acc * X1,
                Back = [X1|Rest]
        ;       Acc = F,
                Back = []
        ).

To my surprise, this works as intended. I can "seed" the fold with an initial value at the head of a partial list, and keep on adding the next element as I consume the current head. The definition of fac_foldl/4 is almost identical to the definition of fac_b_1/4 above: the only difference is that the state is maintained differently. My assumption here is that this should use constant memory: is that assumption wrong?

I know this is silly, but it could however be useful for folding over a list that cannot be known when the fold starts. In the original question we had to find a connected region, given a list of x-y coordinates. It is not enough to fold over the list of x-y coordinates once (you can however do it in two passes; note that there is at least one better way to do it, referenced in the same Wikipedia article, but this also uses multiple passes; altogether, the multiple-pass algorithms assume constant-time access to neighboring pixels!).

My own solution to the original "regions" question looks something like this:

set_region_rest([A|As], Region, Rest) :-
        sort([A|As], [B|Bs]),
        open_set_closed_rest([B], Bs, Region0, Rest),
        sort(Region0, Region).

open_set_closed_rest([], Rest, [], Rest).
open_set_closed_rest([X-Y|As], Set, [X-Y|Closed0], Rest) :-
        X0 is X-1, X1 is X + 1,
        Y0 is Y-1, Y1 is Y + 1,
        ord_intersection([X0-Y,X-Y0,X-Y1,X1-Y], Set, New, Set0),
        append(New, As, Open),
        open_set_closed_rest(Open, Set0, Closed0, Rest).

Using the same "technique" as above, we can twist this into a fold:

set_region_rest_foldl([A|As], Region, Rest) :-
        sort([A|As], [B|Bs]),
        foldl(region_foldl, [B|Back],
                            closed_rest(Region0, Bs)-Back,
                            closed_rest([], Rest)-[]),
        !,
        sort(Region0, Region).

region_foldl(X-Y,
             closed_rest([X-Y|Closed0], Set)-Back,
             closed_rest(Closed0, Set0)-Back0) :-
        X0 is X-1, X1 is X + 1,
        Y0 is Y-1, Y1 is Y + 1,
        ord_intersection([X0-Y,X-Y0,X-Y1,X1-Y], Set, New, Set0),
        append(New, Back0, Back).

This also "works". The fold leaves behind a choice point, because I haven't articulated the end condition as in fac_foldl/4 above, so I need a cut right after it (ugly).

The Questions

  • Is there a clean way of closing the list and removing the cut? In the factorial example, we know when to stop because we have additional information; however, in the second example, how do we notice that the back of the list should be the empty list?
  • Is there a hidden problem I am missing?
  • This looks like its somehow similar to the Implicit State with DCGs, but I have to admit I never quite got how that works; are these connected?
3
Oops, wasn't paying attention. Thought he means the question was deleted. - Mostowski Collapse
This is a SWI-Prolog specific question. It assumes predicates that are neither standard built-in predicates or standard library predicates such as must_be/2 and foldl/4. They aren't even de facto standard predicates. I would re-add the swi-prolog tag but users that like to pretend otherwise would simply delete again. Politics instead of facts. Sad. - Paulo Moura
@PauloMoura I agree with you and have added the tag. Too many times have I seen the [swi-prolog] tag deleted from questions that I didn't even want to bother putting it there in the first place. I didn't know, for example, that must_be/2 and foldl/4 are SWI-Prolog specific :/ - user1812457
How is "fold over a partial list" SWI-specific? foldl/4 is definitely not SWI-specific. It even appears in Richard O'Keefe's library proposal. Any beginner can implement it in any Prolog system. The swi-prolog tag should be reserved for questions that are clearly SWI-specific, so that users find these pertaining questions more easily. Tagging everything where a single predicate that is provided by SWI is used anywhere as "SWI" makes it impossible to find such instances. - mat
@mat I was just reading the same: see down for "Higher order list predicates". foldl/4 is right there. must_be/2, however, isn't. Is it in a standard? - user1812457

3 Answers

3
votes

You are touching on several extremely interesting aspects of Prolog, each well worth several separate questions on its own. I will provide a high-level answer to your actual questions, and hope that you post follow-up questions on the points that are most interesting to you.

First, I will trim down the fragment to its essence:

essence(N) :-
        foldl(essence_(N), [2|Back], Back, _).

essence_(N, X0, Back, Rest) :-
        (   X0 #< N ->
            X1 #= X0 + 1,
            Back = [X1|Rest]
        ;   Back = []
        ).

Note that this prevents the creation of extremely large integers, so that we can really study the memory behaviour of this pattern.

To your first question: Yes, this runs in O(1) space (assuming constant space for arising integers).

Why? Because although you continuously create lists in Back = [X1|Rest], these lists can all be readily garbage collected because you are not referencing them anywhere.

To test memory aspects of your program, consider for example the following query, and limit the global stack of your Prolog system so that you can quickly detect growing memory by running out of (global) stack:

?- length(_, E),
   N #= 2^E,
   portray_clause(N),
   essence(N),
   false.

This yields:

1.
2.
...
8388608.
16777216.
etc.

It would be completely different if you referenced the list somewhere. For example:

essence(N) :-
        foldl(essence_(N), [2|Back], Back, _),
        Back = [].

With this very small change, the above query yields:

?- length(_, E),
   N #= 2^E,
   portray_clause(N),
   essence(N),
   false.
1.
2.
...
1048576.
ERROR: Out of global stack

Thus, whether a term is referenced somewhere can significantly influence the memory requirements of your program. This sounds quite frightening, but really is hardly an issue in practice: You either need the term, in which case you need to represent it in memory anyway, or you don't need the term, in which case it is simply no longer referenced in your program and becomes amenable to garbage collection. In fact, the amazing thing is rather that GC works so well in Prolog also for quite complex programs that not much needs to be said about it in many situations.


On to your second question: Clearly, using (->)/2 is almost always highly problematic in that it limits you to a particular direction of use, destroying the generality we expect from logical relations.

There are several solutions for this. If your CLP(FD) system supports zcompare/3 or a similar feature, you can write essence_/3 as follows:

essence_(N, X0, Back, Rest) :-
        zcompare(C, X0, N),
        closing(C, X0, Back, Rest).

closing(<, X0, [X1|Rest], Rest) :- X1 #= X0 + 1.
closing(=, _, [], _).

Another very nice meta-predicate called if_/3 was recently introduced in Indexing dif/2 by Ulrich Neumerkel and Stefan Kral. I leave implementing this with if_/3 as a very worthwhile and instructive exercise. Discussing this is well worth its own question!


On to the third question: How do states with DCGs relate to this? DCG notation is definitely useful if you want to pass around a global state to several predicates, where only a few of them need to access or modify the state, and most of them simply pass the state through. This is completely analogous to monads in Haskell.

The "normal" Prolog solution would be to extend each predicate with 2 arguments to describe the relation between the state before the call of the predicate, and the state after it. DCG notation lets you avoid this hassle.

Importantly, using DCG notation, you can copy imperative algorithms almost verbatim to Prolog, without the hassle of introducing many auxiliary arguments, even if you need global states. As an example for this, consider a fragment of Tarjan's strongly connected components algorithm in imperative terms:

  function strongconnect(v)
    // Set the depth index for v to the smallest unused index
    v.index := index
    v.lowlink := index
    index := index + 1
    S.push(v)

This clearly makes use of a global stack and index, which ordinarily would become new arguments that you need to pass around in all your predicates. Not so with DCG notation! For the moment, assume that the global entities are simply easily accessible, and so you can code the whole fragment in Prolog as:

scc_(V) -->
        vindex_is_index(V),
        vlowlink_is_index(V),
        index_plus_one,
        s_push(V),

This is a very good candidate for its own question, so consider this a teaser.


At last, I have a general remark: In my view, we are only at the beginning of finding a series of very powerful and general meta-predicates, and the solution space is still largely unexplored. call/N, maplist/[3,4], foldl/4 and other meta-predicates are definitely a good start. if_/3 has the potential to combine good performance with the generality we expect from Prolog predicates.

0
votes

If your Prolog implementation supports freeze/2 or similar predicate (e.g. Swi-Prolog), then you can use following approach:

fac_list(L, N, Max) :-
    (N >= Max, L = [Max], !)
    ;
    freeze(L, (
        L = [N|Rest],
        N2 is N + 1,
        fac_list(Rest, N2, Max)
    )).

multiplication(X, Y, Z) :-
    Z is Y * X.

factorial(N, Factorial) :-
    fac_list(L, 1, N),
    foldl(multiplication, L, 1, Factorial).

Example above first defines a predicate (fac_list) which creates a "lazy" list of increasing integer values starting from N up to maximum value (Max), where next list element is generated only after previous one was "accessed" (more on that below). Then, factorial just folds multiplication over lazy list, resulting in constant memory usage.

The key to understanding how this example works is remembering that Prolog lists are, in fact, just terms of arity 2 with name '.' (actually, in Swi-Prolog 7 the name was changed, but this is not important for this discussion), where first element represents list item and the second element represents tail (or terminating element - empty list, []). For example. [1, 2, 3] can be represented as:

.(1, .(2, .(3, [])))

Then, freeze is defined as follows:

freeze(+Var, :Goal)
    Delay the execution of Goal until Var is bound

This means if we call:

freeze(L, L=[1|Tail]), L = [A|Rest].

then following steps will happen:

  1. freeze(L, L=[1|Tail]) is called
  2. Prolog "remembers" that when L will be unified with "anything", it needs to call L=[1|Tail]
  3. L = [A|Rest] is called
  4. Prolog unifies L with .(A, Rest)
  5. This unification triggers execution of L=[1|Tail]
  6. This, obviously, unifies L, which at this point is bound to .(A, Rest), with .(1, Tail)
  7. As a result, A gets unified with 1.

We can extend this example as follows:

freeze(L1, L1=[1|L2]),
freeze(L2, L2=[2|L3]),
freeze(L3, L3=[3]),
L1 = [A|R2], % L1=[1|L2] is called at this point
R2 = [B|R3], % L2=[2|L3] is called at this point
R3 = [C].    % L3=[3] is called at this point

This works exactly like the previous example, except that it gradually generates 3 elements, instead of 1.

0
votes

As per Boris's request, the second example implemented using freeze. Honestly, I'm not quite sure whether this answers the question, as the code (and, IMO, the problem) is rather contrived, but here it is. At least I hope this will give other people the idea what freeze might be useful for. For simplicity, I am using 1D problem instead of 2D, but changing the code to use 2 coordinates should be rather trivial.

The general idea is to have (1) function that generates new Open/Closed/Rest/etc. state based on previous one, (2) "infinite" list generator which can be told to "stop" generating new elements from the "outside", and (3) fold_step function which folds over "infinite" list, generating new state on each list item and, if that state is considered to be the last one, tells generator to halt.

It is worth to note that list's elements are used for no other reason but to inform generator to stop. All calculation state is stored inside accumulator.

Boris, please clarify whether this gives a solution to your problem. More precisely, what kind of data you were trying to pass to fold step handler (Item, Accumulator, Next Accumulator)?

adjacent(X, Y) :-
    succ(X, Y) ;
    succ(Y, X).

state_seq(State, L) :-
    (State == halt -> L = [], !)
    ;
    freeze(L, (
        L = [H|T],
        freeze(H, state_seq(H, T))
    )).

fold_step(Item, Acc, NewAcc) :-
    next_state(Acc, NewAcc),
    NewAcc = _:_:_:NewRest,
    (var(NewRest) ->
        Item = next ;
        Item = halt
    ).

next_state(Open:Set:Region:_Rest, NewOpen:NewSet:NewRegion:NewRest) :-
    Open = [],
    NewOpen = Open,
    NewSet = Set,
    NewRegion = Region,
    NewRest = Set.

next_state(Open:Set:Region:Rest, NewOpen:NewSet:NewRegion:NewRest) :-
    Open = [H|T],
    partition(adjacent(H), Set, Adjacent, NotAdjacent),
    append(Adjacent, T, NewOpen),
    NewSet = NotAdjacent,
    NewRegion = [H|Region],
    NewRest = Rest.

set_region_rest(Ns, Region, Rest) :-
    Ns = [H|T],
    state_seq(next, L),
    foldl(fold_step, L, [H]:T:[]:_, _:_:Region:Rest).

One fine improvement to the code above would be making fold_step a higher order function, passing it next_state as the first argument.