20
votes

I'm writing a Lisp in Haskell (code at GitHub) as a way of learning more about both languages.

The newest feature that I'm adding is macros. Not hygienic macros or anything fancy - just plain vanilla code transformations. My initial implementation had a separate macro environment, distinct from the environment that all other values live in. Between the read and eval functions I interspersed another function, macroExpand, which walked the code tree and performed the appropriate transformations whenever it found a keyword in the macro environment, before the final form was passed on to eval to be evaluated. A nice advantage of this was that macros had the same internal representation as other functions, which reduced some code duplication.

Having two environments seemed clunky though, and it annoyed me that if I wanted to load a file, eval had to have access to the macro environment in case the file contained macro definitions. So I decided to introduce a macro type, store macros in the same environment as functions and variables, and incorporate the macro expansion phase into eval. I was at first at a bit of a loss for how to do it, until I figured that I could just write this code:

eval env (List (function : args)) = do
    func <- eval env function
    case func of 
        (Macro {}) -> apply func args >>= eval env
        _          -> mapM (eval env) args >>= apply func

It works as follows:

  1. If you are passed a list containing an initial expression and a bunch of other expressions...
  2. Evaluate the first expression
  3. If it's a macro, then apply it to the arguments, and evaluate the result
  4. If it's not a macro, then evaluate the arguments and apply the function to the result

It's as though macros are exactly the same as functions, except the order of eval/apply is switched.

Is this an accurate description of macros? Am I missing something important by implementing macros in this way? If the answers are "yes" and "no", then why have I never seen macros explained this way before?

6
This is a really great example of how implementing a language makes you learn a whole lot more about it.Nate C-K

6 Answers

21
votes

The answers are "no" and "yes".

It looks like you've started with a good model of macros, where the macro level and the runtime level in in separate worlds. In fact, this is one of the main points behind Racket's macro system. You can read some brief text about it in the Racket guide, or see the original paper that describes this feature and why it's a good idea to do that. Note that Racket's macro system is a very sophisticated one, and it's hygienic -- but phase separation is a good idea regardless of hygiene. To summarize the main advantage, it makes it possible to always expand code in a reliable way, so you get benefits like separate compilation, and you don't depend on code loading order and such problems.

Then, you moved into a single environment, which loses that. In most of the Lisp world (eg, in CL and in Elisp), this is exactly how things are done -- and obviously, you run into the problems that are described above. ("Obvious" since phase separation was designed to avoid these, you just happened to get your discoveries in the opposite order from how they happened historically.) In any case, to address some of these problems, there is the eval-when special form, which can specify that some code is evaluated at run-time or at macro-expansion-time. In Elisp you get that with eval-when-compile, but in CL you get much more hair, with a few other "*-time"s. (CL also has read-time, and having that share the same environment as everything else is triple the fun.) Even if it seems like a good idea, you should read around around and see how some lispers lose hair because of this mess.

And in the last step of your description you step even further back in time and discover something that is known as FEXPRs. I won't even put any pointers for that, you can find a ton of texts about it, why some people think that it's a really bad idea, why some other people think that it's a really good idea. Practically speaking, those two "some"s are "most" and "few" respectively -- though the few remaining FEXPR strongholds can be vocal. To translate all of this: it's explosive stuff... Asking questions about it is a good way to get long flamewars. (As a recent example of a serious discussion you can see the initial discussion period for the R7RS, where FEXPRs came up and lead to exactly these kinds of flames.) No matter which side you choose to sit at, one thing is obvious: a language with FEXPRs is extremely different than a language without them. [Coincidentally, working on an implementation in Haskell might affect your view, since you have a place to go to for a sane static world for code, so the temptation in "cute" super-dynamic languages is probably bigger...]

One last note: since you're doing something similar, you should look into a similar project of implementing a Scheme in Haskell -- IIUC, it even has hygienic macros.

16
votes

Not quite. Actually, you've pretty concisely described the difference between "call by name" and "call by value"; a call-by-value language reduces arguments to values before substitution, a call-by-name language performs substitution first, then reduction.

The key difference is that macros allow you to break referential transparency; in particular, the macro can examine the code, and thus can differentiate between (3 + 4) and 7, in a way that ordinary code can't. That's why macros are both more powerful and also more dangerous; most programmers would be upset if they found that (f 7) produced one result and (f (+ 3 4)) produced a different result.

6
votes

Background Rambling

What you have there is very late binding macros. This is a workable approach, but it is inefficient, because repeated executions of the same code will repeatedly expand the macros.

On the positive side, this is friendly for interactive development. If the programmer changes a macro, and then re-invokes some code which uses it, such as a previously defined function, the new macro instantly takes effect. This is an intuitive "do what I mean" behavior.

Under a macro system which expands macros earlier, the programmer has to redefine all of the functions that depend on a macro when that macro changes, otherwise the existing definitions continue to be based on the old macro expansions, oblivious to the new version of the macro.

A reasonable approach is to have this late binding macro system for interpreted code, but a "regular" (for lack of a better word) macro system for compiled code.

Expanding macros does not require a separate environment. It should not, because local macros should be in the same namespace as variables. For instance in Common Lisp if we do this (let (x) (symbol-macrolet ((x 'foo)) ...)), the inner symbol macro shadows the outer lexical variable. The macro expander has to be aware of the variable binding forms. And vice versa! If there is an inner let for the variable x, it shadows an outer symbol-macrolet. The macro expander cannot just blindly substitute all occurrences of x that occur in the body. So in other words, Lisp macro expansion has to be aware of the full lexical environment in which macros and other kinds of bindings coexist. Of course, during macro expansion, you don't instantiate the environment in the same way. Of course if there is a (let ((x (function)) ..), (function) is not called and x is not given a value. But the macro expander is aware that there is an x in this environment and so occurrences of x are not macros.

So when we say one environment, what we really mean is that there are two different manifestations or instantiations of a unified environment: the expansion-time manifestation and then the evaluation-time manifestation. Late-binding macros simplify the implementation by merging these two times into one, as you have done, but it does not have to be that way.

Also note that Lisp macros can accept an &environment parameter. This is needed if the macros need to call macroexpand on some piece of code supplied by the user. Such a recursion back into the macro expander through a macro has to pass the proper environment so the user's code has access to its lexically surrounding macros and gets expanded properly.

Concrete Example

Suppose we have this code:

(symbol-macrolet ((x (+ 2 2)))
   (print x)
   (let ((x 42)
         (y 19))
     (print x)
     (symbol-macrolet ((y (+ 3 3)))
       (print y))))

The effect of this to prints 4, 42 and 6. Let's use the CLISP implementation of Common Lisp, and expand this using CLISP's implementation-specific function called system::expand-form. We cannot use regular, standard macroexpand because that will not recurse into the local macros:

(system::expand-form   
  '(symbol-macrolet ((x (+ 2 2)))
     (print x)
     (let ((x 42)
           (y 19))
       (print x)
       (symbol-macrolet ((y (+ 3 3)))
         (print y)))))

-->

(LOCALLY    ;; this code was reformatted by hand to fit your screen
  (PRINT (+ 2 2))
  (LET ((X 42) (Y 19))
    (PRINT X)
    (LOCALLY (PRINT (+ 3 3))))) ;

(Now firstly, about these locally forms. Why are they there? Note that they correspond to places where we had a symbol-macrolet. This is probably for the sake of declarations. If the body of a symbol-macrolet form has declarations, they have to be scoped to that body, and locally will do that. If the expansion of symbol-macrolet does not leave behind this locally wrapping, then declarations will have the wrong scope.)

From this macro expansion you can see what the task is. The macro expander has to walk the code and recognize all binding constructs (all special forms, really), not only binding constructs having to do with the macro system.

Notice how one of the instances of (print x) is left alone: the one which is in the scope of the (let ((x ..)) ...). The other became (print (+ 2 2)), in accordance with the symbol macro for x.

Another thing we can learn from this is that macro expansion just substitutes the expansion and removes the symbol-macrolet forms. So the environment that remains is the original one, minus all of the macro material which is scrubbed away in the expansion process. The macro expansion honors all of the lexical bindings, in one big "Grand Unified" environment, but then graciously vaporizes, leaving behind just the code like (print (+ 2 2)) and other vestiges like the (locally ...), with just the non-macro binding constructs resulting in a reduced version of the original environment.

Thus now when the expanded code is evaluated, just the reduced environment's run-time personality comes into play. The let bindings are instantiated and stuffed with initial values, etc. During expansion, none of that was happening; the non-macro bindings just lie there asserting their scope, and hinting at a future existence in the run time.

4
votes

What you're missing is that this symmetry breaks down when you separate analysis from evaluation, which is what all practical Lisp implementations do. Macro expansion would occur during the analysis phase so eval can be kept simple.

2
votes

I really recommend to have some of the Lisp books handy. Recommended is for example Christian Queinnec, Lisp in Small Pieces. The book is about the implementation of Scheme.

http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html

Chapter 9 is about macros: http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/chap9.html

1
votes

For what its worth, the Scheme R5RS section Binding constructs for syntactic keywords has this to say about it:

Let-syntax and letrec-syntax are analogous to let and letrec, but they bind syntactic keywords to macro transformers instead of binding variables to locations that contain values.

See: http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-7.html#%_sec_4.3.1

This seems to imply a separate strategy should be used, at least for the syntax-rules macro system.


#;1> let
Error: unbound variable: let
#;1> (define let +)
#;2> (let ((talk "hello!")) (write talk))
"hello!"
#;3> let
#<procedure C_plus>
#;4> (let 1 2)
Error: (let) not a proper list: (let 1 2)

    Call history:

    <syntax>                (let 1 2)       <--
#;4> (define a let)
#;5> (a 1 2)
3