Fold over a partial list

Question

This is a question provoked by an already deleted answer to this question. The issue could be summarized as follows:

Is it possible to fold over a list, with the tail of the list generated while folding?

Here is what I mean. Say I want to calculate the factorial (this is a silly example but it is just for demonstration), and decide to do it like this:

fac_a(N, F) :-
        must_be(nonneg, N),
        (       N =< 1
        ->      F = 1
        ;       numlist(2, N, [H|T]),
                foldl(multiplication, T, H, F)
        ).

multiplication(X, Y, Z) :-
        Z is Y * X.

Here, I need to generate the list that I give to foldl. However, I could do the same in constant memory (without generating the list and without using foldl):

fac_b(N, F) :-
        must_be(nonneg, N),
        (       N =< 1
        ->      F = 1
        ;       fac_b_1(2, N, 2, F)
        ).

fac_b_1(X, N, Acc, F) :-
        (       X < N
        ->      succ(X, X1),
                Acc1 is X1 * Acc,
                fac_b_1(X1, N, Acc1, F)
        ;       Acc = F
        ).

The point here is that unlike the solution that uses foldl, this uses constant memory: no need for generating a list with all values!

Calculating a factorial is not the best example, but it is easier to follow for the stupidity that comes next.

Let's say that I am really afraid of loops (and recursion), and insist on calculating the factorial using a fold. I still would need a list, though. So here is what I might try:

fac_c(N, F) :-
        must_be(nonneg, N),
        (       N =< 1
        ->      F = 1
        ;       foldl(fac_foldl(N), [2|Back], 2-Back, F-[])
        ).

fac_foldl(N, X, Acc-Back, F-Rest) :-
        (       X < N
        ->      succ(X, X1),
                F is Acc * X1,
                Back = [X1|Rest]
        ;       Acc = F,
                Back = []
        ).

To my surprise, this works as intended. I can "seed" the fold with an initial value at the head of a partial list, and keep on adding the next element as I consume the current head. The definition of fac_foldl/4 is almost identical to the definition of fac_b_1/4 above: the only difference is that the state is maintained differently. My assumption here is that this should use constant memory: is that assumption wrong?

I know this is silly, but it could however be useful for folding over a list that cannot be known when the fold starts. In the original question we had to find a connected region, given a list of x-y coordinates. It is not enough to fold over the list of x-y coordinates once (you can however do it in two passes; note that there is at least one better way to do it, referenced in the same Wikipedia article, but this also uses multiple passes; altogether, the multiple-pass algorithms assume constant-time access to neighboring pixels!).

My own solution to the original "regions" question looks something like this:

set_region_rest([A|As], Region, Rest) :-
        sort([A|As], [B|Bs]),
        open_set_closed_rest([B], Bs, Region0, Rest),
        sort(Region0, Region).

open_set_closed_rest([], Rest, [], Rest).
open_set_closed_rest([X-Y|As], Set, [X-Y|Closed0], Rest) :-
        X0 is X-1, X1 is X + 1,
        Y0 is Y-1, Y1 is Y + 1,
        ord_intersection([X0-Y,X-Y0,X-Y1,X1-Y], Set, New, Set0),
        append(New, As, Open),
        open_set_closed_rest(Open, Set0, Closed0, Rest).

Using the same "technique" as above, we can twist this into a fold:

set_region_rest_foldl([A|As], Region, Rest) :-
        sort([A|As], [B|Bs]),
        foldl(region_foldl, [B|Back],
                            closed_rest(Region0, Bs)-Back,
                            closed_rest([], Rest)-[]),
        !,
        sort(Region0, Region).

region_foldl(X-Y,
             closed_rest([X-Y|Closed0], Set)-Back,
             closed_rest(Closed0, Set0)-Back0) :-
        X0 is X-1, X1 is X + 1,
        Y0 is Y-1, Y1 is Y + 1,
        ord_intersection([X0-Y,X-Y0,X-Y1,X1-Y], Set, New, Set0),
        append(New, Back0, Back).

This also "works". The fold leaves behind a choice point, because I haven't articulated the end condition as in fac_foldl/4 above, so I need a cut right after it (ugly).

The Questions

Is there a clean way of closing the list and removing the cut? In the factorial example, we know when to stop because we have additional information; however, in the second example, how do we notice that the back of the list should be the empty list?
Is there a hidden problem I am missing?
This looks like its somehow similar to the Implicit State with DCGs, but I have to admit I never quite got how that works; are these connected?

Oops, wasn't paying attention. Thought he means the question was deleted. — Mostowski Collapse
This is a SWI-Prolog specific question. It assumes predicates that are neither standard built-in predicates or standard library predicates such as must_be/2 and foldl/4. They aren't even de facto standard predicates. I would re-add the swi-prolog tag but users that like to pretend otherwise would simply delete again. Politics instead of facts. Sad. — Paulo Moura
@PauloMoura I agree with you and have added the tag. Too many times have I seen the [swi-prolog] tag deleted from questions that I didn't even want to bother putting it there in the first place. I didn't know, for example, that must_be/2 and foldl/4 are SWI-Prolog specific :/ — user1812457
How is "fold over a partial list" SWI-specific? foldl/4 is definitely not SWI-specific. It even appears in Richard O'Keefe's library proposal. Any beginner can implement it in any Prolog system. The swi-prolog tag should be reserved for questions that are clearly SWI-specific, so that users find these pertaining questions more easily. Tagging everything where a single predicate that is provided by SWI is used anywhere as "SWI" makes it impossible to find such instances. — mat
@mat I was just reading the same: see down for "Higher order list predicates". foldl/4 is right there. must_be/2, however, isn't. Is it in a standard? — user1812457

mat mat · Accepted Answer · 2016-09-16T17:07:23

You are touching on several extremely interesting aspects of Prolog, each well worth several separate questions on its own. I will provide a high-level answer to your actual questions, and hope that you post follow-up questions on the points that are most interesting to you.

First, I will trim down the fragment to its essence:

essence(N) :-
        foldl(essence_(N), [2|Back], Back, _).

essence_(N, X0, Back, Rest) :-
        (   X0 #< N ->
            X1 #= X0 + 1,
            Back = [X1|Rest]
        ;   Back = []
        ).

Note that this prevents the creation of extremely large integers, so that we can really study the memory behaviour of this pattern.

To your first question: Yes, this runs in O(1) space (assuming constant space for arising integers).

Why? Because although you continuously create lists in Back = [X1|Rest], these lists can all be readily garbage collected because you are not referencing them anywhere.

To test memory aspects of your program, consider for example the following query, and limit the global stack of your Prolog system so that you can quickly detect growing memory by running out of (global) stack:

?- length(_, E),
   N #= 2^E,
   portray_clause(N),
   essence(N),
   false.

This yields:

1.
2.
...
8388608.
16777216.
etc.

It would be completely different if you referenced the list somewhere. For example:

essence(N) :-
        foldl(essence_(N), [2|Back], Back, _),
        Back = [].

With this very small change, the above query yields:

?- length(_, E),
   N #= 2^E,
   portray_clause(N),
   essence(N),
   false.
1.
2.
...
1048576.
ERROR: Out of global stack

Thus, whether a term is referenced somewhere can significantly influence the memory requirements of your program. This sounds quite frightening, but really is hardly an issue in practice: You either need the term, in which case you need to represent it in memory anyway, or you don't need the term, in which case it is simply no longer referenced in your program and becomes amenable to garbage collection. In fact, the amazing thing is rather that GC works so well in Prolog also for quite complex programs that not much needs to be said about it in many situations.

On to your second question: Clearly, using (->)/2 is almost always highly problematic in that it limits you to a particular direction of use, destroying the generality we expect from logical relations.

There are several solutions for this. If your CLP(FD) system supports zcompare/3 or a similar feature, you can write essence_/3 as follows:

essence_(N, X0, Back, Rest) :-
        zcompare(C, X0, N),
        closing(C, X0, Back, Rest).

closing(<, X0, [X1|Rest], Rest) :- X1 #= X0 + 1.
closing(=, _, [], _).

Another very nice meta-predicate called if_/3 was recently introduced in Indexing dif/2 by Ulrich Neumerkel and Stefan Kral. I leave implementing this with if_/3 as a very worthwhile and instructive exercise. Discussing this is well worth its own question!

On to the third question: How do states with DCGs relate to this? DCG notation is definitely useful if you want to pass around a global state to several predicates, where only a few of them need to access or modify the state, and most of them simply pass the state through. This is completely analogous to monads in Haskell.

The "normal" Prolog solution would be to extend each predicate with 2 arguments to describe the relation between the state before the call of the predicate, and the state after it. DCG notation lets you avoid this hassle.

Importantly, using DCG notation, you can copy imperative algorithms almost verbatim to Prolog, without the hassle of introducing many auxiliary arguments, even if you need global states. As an example for this, consider a fragment of Tarjan's strongly connected components algorithm in imperative terms:

  function strongconnect(v)
    // Set the depth index for v to the smallest unused index
    v.index := index
    v.lowlink := index
    index := index + 1
    S.push(v)

This clearly makes use of a global stack and index, which ordinarily would become new arguments that you need to pass around in all your predicates. Not so with DCG notation! For the moment, assume that the global entities are simply easily accessible, and so you can code the whole fragment in Prolog as:

scc_(V) -->
        vindex_is_index(V),
        vlowlink_is_index(V),
        index_plus_one,
        s_push(V),

This is a very good candidate for its own question, so consider this a teaser.

At last, I have a general remark: In my view, we are only at the beginning of finding a series of very powerful and general meta-predicates, and the solution space is still largely unexplored. call/N, maplist/[3,4], foldl/4 and other meta-predicates are definitely a good start. if_/3 has the potential to combine good performance with the generality we expect from Prolog predicates.

Fold over a partial list

The Questions

3 Answers