1
votes

I am still having some troubles with this concept. The key paragraph in the r7rs standard is:

"Identifiers that appear in the template but are not pattern variables or the identifier ellipsis are inserted into the output as literal identifiers. If a literal identifier is inserted as a free identifier then it refers to the binding of that identifier within whose scope the instance of syntax-rules appears. If a literal identifier is inserted as a bound identifier then it is in effect renamed to prevent inadvertent captures of free identifiers."

By "bound identifier" am I right that it means any argument to a lambda, a top-level define or a syntax definition ie. define-syntax, let-syntax or let-rec-syntax? (I think I could handle internal defines with a trick at compile time converting them to lambdas.)

By "free identifier" does it mean any other identifier that presumably is defined beforehand with a "bound identifier" expression?

I wonder about the output of code like this:

(define x 42)

(define-syntax double syntax-rules ()
    ((_) ((lambda () (+ x x)))))

(set! x 3)
(double)

Should the result be 84 or 6?

What about this:

(define x 42)

(define-syntax double syntax-rules ()
    ((_) ((lambda () (+ x x)))))

(define (proc)
    (define x 3)
    (double))
(proc)

Am I right to suppose that since define-syntax occurs at top-level, all its free references refer to top-level variables that may or may not exist at the point of definition. So to avoid collisions with local variables at the point of use, we should rename the outputted free reference, say append a '%' to the name (and disallow the user to create symbols with % in them). As well as duplicate the reference to the top-level variable, this time with the % added.

If a macro is defined in some form of nested scope (with let-syntax or let-rec-syntax) this is even trickier if it refers to scoped variables. When there is a use of the macro it will have to expand these references to their form at point of definition of the macro rather than point of use. So I'm guessing the best way is expand it naturally and scan the result for lambdas, if it finds one, rename its arguments at point of definition, as the r7rs suggests. But what about references internal to this lambda, should we change these as well? This seems obvious but was not explicitly stated in the standard.

Also I'm still not sure whether it is best to have a separate expansion phase separate from the compiler, or to interweave expanding macros with compiling code.

Thanks, and excuse me if I've missed something obviously, relatively new to this.

Steve

2

2 Answers

2
votes

In your first example, properly written:

(define x 42)

(define-syntax double
  (syntax-rules ()
    ( (_) ((lambda () (+ x x))) ) ))

(set! x 3)
(double)

the only possibility is 6 as there is only one variable called x.

In your second example, properly written:

(define x 42)

(define-syntax double
  (syntax-rules ()
    ((_) ((lambda () (+ x x))) )))

(define (proc)
  (define x 3)
  (double))

(proc)

the hygienic nature of the Scheme macro system prevents capture of the unrelated local x, so the result is 84.

In general, identifiers (like your x) within syntax-rules refer to what they lexically refer to (the global x in your case). And that binding will be preserved because of hygiene. Because of hygiene you do not have to worry about unintended capture.

-1
votes

Thanks, I think I understand... I still wonder how in certain advanced circumstances hygiene is achieved, eg. the following:

(define (myproc x)
  (let-syntax ((double (syntax-rules ()
    ((double) (+ x x)))))
    ((lambda (x) (double)) 3)))

(myproc 42)

The site comes up with 84 rather than 6. I wonder how this (correct) referential transparency is achieved just by renaming. The transformer output does not bind new variables, yet still when it expands on line 4, we have to find a way to get to the desired x rather than the most recent.

The best way I can think of is simply rename every time a lambda argument or definition shadows another, ie. keep appending %1, %2 etc... macro outputs will have their exact versions named (eg. x%1) while references to identifiers simply have their unadorned name x and the correct variable is found at compile time.

Thanks, I hope for any clarification.

Steve