8
votes

I'm trying to define a new language in racket, let's call it wibble. Wibble will allow modules to be loaded so it has to translate it's forms to Racket require forms. But I'm having trouble getting require to work when used in a language extension. I eventually tracked down my problems to the following strange behaviour.

Here's my reader which redefines read and read-syntax

=== wibble/lang/reader.rkt ===
#lang racket/base

(provide (rename-out (wibble-read read) (wibble-read-syntax read-syntax)))

(define (wibble-read in)
  (wibble-read-syntax #f in))

(define (wibble-read-syntax src in)
  #`(module #,(module-name src) wibble/lang
      #,@(read-all src in)))

(define (module-name src)
  (if (path? src)
      (let-values (((base name dir?) (split-path src)))
        (string->symbol (path->string (path-replace-suffix name #""))))
      'anonymous-module))

(define (read-all src in)
  (let loop ((all '()))
    (let ((obj (read-syntax src in)))
      (if (eof-object? obj)
          (reverse all)
          (loop (cons obj all))))))

and here's my much simplified language module, this introduces (require racket/base) into each wibble module

=== wibble/lang.rkt ===
#lang racket/base

(require (for-syntax racket/base))

(provide (rename-out (wibble-module-begin #%module-begin)) #%app #%datum #%top)

(define-syntax wibble-module-begin
  (lambda (stx)
    (syntax-case stx ()
      ((_ x ...) #`(#%module-begin (require #,(datum->syntax stx 'racket/base)) x ...)))))

With the above code then this wibble code 'works', i.e. there are no errors

#lang wibble
(cons 1 2)
(cons 3 4)

but the following

#lang wibble
(cons 1 2)

gives error message cons: unbound identifier in module in: cons

Really I'm just looking for an explanation as to what going on. I'm sure the difference is related to this from the racket docs (Racket Reference 3.1)

If a single form is provided, then it is partially expanded in a module-begin context. If the expansion leads to #%plain-module-begin, then the body of the #%plain-module-begin is the body of the module. If partial expansion leads to any other primitive form, then the form is wrapped with #%module-begin using the lexical context of the module body; this identifier must be bound by the initial module-path import, and its expansion must produce a #%plain-module-begin to supply the module body. Finally, if multiple forms are provided, they are wrapped with #%module-begin, as in the case where a single form does not expand to #%plain-module-begin.

but even with that I don't understand why having a single form makes any difference, it's seems to be somthing to do with the timing of partial expansion but I'm not really sure. Nor do I understand why Racket treats a single form as a special case.

Incidentally I can fix the problem with a slight modification to my reader

(define (wibble-read-syntax src in)
  #`(module #,(module-name src) wibble/lang
      #,@(read-all src in) (void)))

Hard-coding a (void) form means I always have more than one form and eveything works.

Sorry for the long post, I'm just looking for some understanding of how this stuff works.

2

2 Answers

5
votes

Alright, I think that I've figured it out.

Your intuition is correct in that the problem lies within the timing of the partial expansion of the single-form module body. Inside of your reader.rkt file, you produce a (module ...) form. As the quoted excerpt from your question states, the forms ... portion of this is then treated specially, since there is only one. Let's take a look at an excerpt from the documentation on partial expansion:

As a special case, when expansion would otherwise add an #%app, #%datum, or #%top identifier to an expression, and when the binding turns out to be the primitive #%app, #%datum, or #%top form, then expansion stops without adding the identifier.

I am almost certain that the partial expansion which occurs at this point does something to the cons identifier. This is the one part that I remain unsure of... my gut tells me that what's happening is that the partial expansion is attempting to find the binding for the cons identifier (since it is the first part of the parentheses, the identifier could be bound to a macro which should be expanded, so that needs to be checked) but is unable to, so it throws a tantrum. Note that even if cons has no phase 1 (syntax-expansion time) binding, the macro expander still expects there to be a phase 0 (runtime) binding for the identifier (among other things, this helps the expander remain hygienic). Because all of this partial expansion happens to the body of your (module ...) form (which is done before your (#%module-begin ...) form where you inject the (#%require ...) form), cons has no binding during the expansion, so the expansion, I believe, fails.

Nevertheless, a naive fix for your problem is to rewrite wibble-read-syntax as follows:

(define (wibble-read-syntax src in)
  (let* ((read-in (read-all src in))
         (in-stx (and (pair? read-in) (car read-in))))
    #`(module #,(module-name src) wibble/lang
        (require #,(datum->syntax in-stx 'racket/base))
        #,@read-in))

You can then remove the (#%require ...) form from your (#%module-begin ...) macro.

That's not, in my opinion, the best way to fix the issue, however. As a matter of cleanliness, hard-coding in a require form like you've done in wibble/lang.rkt would make Eli Barzilay and co. cry. A much simpler way to do what you are trying to do is by updating your lang.rkt file to something like so:

=== wibble/lang.rkt ===
#lang racket/base

(require (for-syntax racket/base))

(provide (rename-out (wibble-module-begin #%module-begin))
         (except-out (all-from-out racket/base) #%module-begin #%app #%datum #%top)
     #%app #%datum #%top)

(define-syntax wibble-module-begin
  (lambda (stx)
    (syntax-case stx ()
      ((_ x ...) #`(#%module-begin  x ...)))))

Writing in this convention removes the need for any hard-coded (require ...) forms and prevents subtle bugs like the one you've unearthed from occuring. If you are confused why this works, remember that you've already provided the #%module-begin identifier using this file, which is subsequently bound in all #lang wibble files. In principle, there is no limit on what identifiers you can bind in this fashion. If you would like some further reading, here's a shameless self-advertisement for a blog post I wrote a little while back on the subject.

I hope I've helped.

2
votes

The problem is with the require (though I'm not sure I 100% understand all the behavior).

(require X) imports bindings from X with the lexical context of #'X. #'X here has the context of stx, which is the entire #'(module-begin x ...), which is not the context you want. You want the context of one of the cons expressions, i.e., one of the #'xs.

Something like this should work:

(define-syntax wibble-module-begin
  (lambda (stx)
    (syntax-case stx ()
      [(_) #'(#%module-begin)]
      [(m x y ...)
       #`(#%module-begin
          (require #,(datum->syntax #'x 'racket/base))
          x y ...)])))

Though, as @belph warned, there's probably a more idiomatic way to accomplish what you want.

The behavior of your original program, and as you intuited, likely has to do with module's different treatment of single and multi sub-forms, but I think the "working" case might be an accident and could be a bug in the racket compiler.