7
votes

The word LOOP is described as "Resolve the destination of all unresolved occurrences of LEAVE". (emphasis mine)

Unlike IF ... ELSE ... THEN, where the number of forward references is always one, LOOP has no constraints on the quantity of LEAVE. How to implement it then?

One approach I thought of is to always keep the number of LEAVEs on top of the stack. Each LEAVE increments this counter and puts itself under it. LOOP reads the counter from the top and resolves that many references. But it seems like a cheap trick.

How do real Forth systems implement this kind of loop? I don't need teh codez (been implementing Forth as a learning experience), just the concepts.

5

5 Answers

4
votes

In SP-Forth the loop control parameters comprise the index, limit, and address after LOOP. So, no need to resolve LEAVE at compile time, it knows the address from the loop control parameters at run-time.

Another approach is to store the control-flow stack depth on DO, place the unresolved forward reference on the control-flow stack under all other already placed values (using the stored depth) on LEAVE, and then resolve all the placed forward references on LOOP.

See my high-level implementation of DO LOOP on the basis of BEGIN UNTIL and AHEAD THEN (spoiler warning).

2
votes

As another approach, you can thread a singly-linked list through the unresolved address "holes". I used this when I implemented counted loops in my FORTH:

( We can now define DO, ?DO, LOOP, +LOOP and LEAVE. It would be much easier
  if LEAVE didn't exist, but oh well. Because storing LEAVE's data on the stack
  would interfere with other control flow inside the loop, let's store it in a variable. )

VARIABLE LEAVE-PTR

( Let's consider the base case: only one LEAVE in the loop. This can be trivially
  handled by storing the address we need to patch in the variable.

  This would also work quite well with nested loops. All we need to do is store
  the old value of the variable on the stack when opening a loop.

  Finally, we can extend this to an arbitrary number of LEAVEs by threading
  a singly-linked list through the branch target address holes. )

\ ...

: LEAVE, ( -- )
  HERE
  LEAVE-PTR @ ,
  LEAVE-PTR !
;

: LEAVE POSTPONE BRANCH LEAVE, ; IMMEDIATE

: DO ( -- old-leave-ptr loop-beginning )
  LEAVE-PTR @
  0 LEAVE-PTR !
  POSTPONE 2>R
  HERE
; IMMEDIATE

\ SOME-LOOP is the common code between LOOP and +LOOP
: SOME-LOOP ( old-leave-ptr loop-beginning -- )
  POSTPONE 0BRANCH ,
  LEAVE-PTR @
  BEGIN
    ?DUP
  WHILE
    DUP @ >R
    HERE SWAP !
    R>
  REPEAT
  POSTPONE UNLOOP
  LEAVE-PTR !
;
1
votes

Since somebody already posted a great high-level solution I thought it might help to address things from a different perspective. I've recently written a Forth library called Shi in ARM-Thumb2 assembly.

In case you're comfortable with reading assembly code the source of leave can be found here.

The way it works is almost like you described it. I've used a single byte to count the nesting levels of do...loop constructs and a dedicated stack pointer which I called "control-flow stack pointer". This special stack pointer points to the end of the data stack and can push forward references in reverse order. Pushing things in from the other side has the benefit that everything else on the stack stays untouched.

The nesting level and some pointer arithmetic then allows me to resolve all potential forward references a leave might have left regardless of how deeply nested the loop was or how many leaves there were.

1
votes

I found this massively difficult when developing my Forth (like you, as a learning experience).

What I did in the end was this:

  1. each loop has a branch compiled from the top of the loop to the bottom, that's a necessity for ?DO but even for DO, I compile it anyway.
  2. each LEAVE works by unconditionally branching to the TOP of the loop (well, slightly below the top - right to the opening branch), which then all on its own branches back to the clean-up code at the bottom.
  3. That way, there's nothing for LOOP/+LOOP to resolve, it has all been done already.

Beats me how I came up with that wheeze - I read the code again this evening and I still don't understand how I ever got it to work!

0
votes

In fig-Forth, DO and friends have a quite small and simple implementation.

It has a set of primitive words that implement the runtime behavior. These are all implemented in machine code.

(DO)   ( limit start -- ) move the two values to the return stack
I      ( -- index )       fetch the current value of the index variable
LEAVE  ( -- )             set the current index=limit
(LOOP) ( -- )             checks index<limit and branch back if true.

So all the compiler has to do is track the single branch address back to the start of the loop and compile the offset just after (LOOP). These are implemented in high level code.

: BACK  HERE - , ;
: DO  COMPILE (DO) HERE ; IMMEDIATE
: LOOP  COMPILE (LOOP) BACK ; IMMEDIATE

So, LEAVE in this definition doesn't do a branch at all. It just adjusts the stored index value so that LOOP will stop looping once execution gets there.