20
votes

I created a small macro metaprogramming library that implements basic useful constructs such as REPEAT(times, x), IF(value, true, false), tuples, and more.

Most of my implementations work by overloading macros based upon their variadic argument count or through a counter:

// Example:
#define REPEAT_0(x) 
#define REPEAT_1(x) x REPEAT_0(x) 
#define REPEAT_2(x) x REPEAT_1(x)
#define REPEAT_3(x) x REPEAT_2(x)
// ...
// (these defines are generated using an external script)
// ...

#define REPEAT(count, x) CAT(REPEAT_, count)(x)

This works fine, but I've recently come across an extremely interesting implementation of macro recursion by Paul Fultz.

Up until the deferred expression section I had no trouble understanding his article.

I am, however, having a lot of trouble understanding the use of DEFER and OBSTRUCT properly.

Paul implements a very elegant version of REPEAT that does not require script-generated defines like this:

#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define WHEN(c) IF(c)(EXPAND, EAT)

#define REPEAT(count, macro, ...) \
    WHEN(count) \
    ( \
        OBSTRUCT(REPEAT_INDIRECT) () \
        ( \
            DEC(count), macro, __VA_ARGS__ \
        ) \
        OBSTRUCT(macro) \
        ( \
            DEC(count), __VA_ARGS__ \
        ) \
    )
#define REPEAT_INDIRECT() REPEAT

//An example of using this macro
#define M(i, _) i
EVAL(REPEAT(8, M, ~)) // 0 1 2 3 4 5 6 7

DEFER, OBSTRUCT and other utilities are implemented as such:

#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()
#define EXPAND(...) __VA_ARGS__

#define A() 123
A() // Expands to 123
DEFER(A)() // Expands to A () because it requires one more scan to fully expand
EXPAND(DEFER(A)()) // Expands to 123, because the EXPAND macro forces another scan

  • When the preprocessor expands a macro, the result is "painted" until the next scan - it will not expand recursively unless an additional scan occurs. Is this correct?

  • Does the EXPAND(...) macro force an additional scan? If so, does this scan allow macros to expand recursively? What's the difference btween EXPAND(...) and DEFER(id)?

    • Does DEFER force two additional scans?
  • What about the OBSTRUCT(...) macro? Does it force two additional scans?

  • Now - why is OBSTRUCT required in the recursive implementation of REPEAT? Why wouldn't DEFER or EXPAND work here?

1
@EdHeal: come on, please - macros have their uses in code generation (and much more), which cannot be done through C++ code. Please don't disrespect macros. Disrespect macro abuse, which is different.Vittorio Romeo
You can't generate code with functions, it doesn't matter if they're inline. Things you can do better with macros: generate enumerations whose value can be converted to/from strings; define terse serialization helpers; avoid code repetition such as std::forward<decltype(something)>(something) to FWD(something); safe wrappers for things like offsetof (I created a wrapper that throws a compilation error if the passed type is not standard-layout); better assertions; no-boilerplate unit tests; and much more. Macros have their uses.Vittorio Romeo
Macro replacement works in four basic steps (ignoring # and ## operators which complicate matters). 1) Each argument is fully macro expanded in isolation from the rest of the program. 2) Formal parameters in the body of the macro are replaced with these expanded arguments. 3) Macro invocation is replaced with this expanded body. 4) The program is pre-processed, from this point onward, as if the expanded body has always been there; this may find more macros.Igor Tandetnik
Note that on step 4, the preprocessor doesn't look backward: if say you have a macro that expands to () it doesn't look back to see if there was a macro name preceding this invocation. That's what makes DEFER work. Step #1 (fully replace each argument in isolation) makes EXPAND work.Igor Tandetnik
FYI: See also the chaos-pp library which implemented this type of thing a long time ago now.Paul Mensonides

1 Answers

28
votes

The use of macros like DEFER, and complicated C macrology in general, depends on understanding how the C preprocessor actually expands macro expressions. It doesn't just attempt to reduce all expression trees the way a conventional programming language does, but rather, it works on a linear token stream, and has an implicit "cursor" at the point where it's currently examining the stream for possible replacements. Within any given "stack frame" of the expansion process, the cursor never moves backwards, and once a token has been passed in the stream it is not examined again.

Walking through the first example of DEFER's operation:

 DEFER(A)()  // cursor starts at the head of the sequence
^            // identifies call to DEFER - push current position

 DEFER( A )()  // attempt to expand the argument (nothing to do)
       ^
 // replace occurrences of id in DEFER with A,
 // then replace the call to it with the substituted body

 A EMPTY() ()  // pop cursor position (start of pasted subsequence)
^              // doesn't find an expansion for A, move on
 A EMPTY() ()  // move to next token
  ^            // EMPTY() is a valid expansion
 A  ()         // replace EMPTY() with its body in the same way
   ^           // continuing...
 A ()          // `(` is not a macro, move on
  ^
 A ( )         // `)` is not a macro, move on
    ^
 A ()          // end of sequence, no more expansions
     ^

The cursor moved past A during the "rescan" of DEFER's body, after the arguments had been substituted, which is the second and last attempt to expand that set of tokens. Once the cursor has moved past A it does not return to it during that expansion sequence, and since the "rescan" is at the top level there is no following expansion sequence.

Now consider the same expression, but wrapped in a call to EXPAND:

 EXPAND(DEFER(A)())    // cursor starts at the head etc.
^                      // identifies call to EXPAND

 EXPAND( DEFER(A)() )  // attempt to expand the argument
        ^              // this does the same as the first
                       // example, in a NESTED CONTEXT

 // replace occurrences of __VA_ARGS__ in EXPAND with A ()
 // then replace the call with the substituted body

 A ()          // pop cursor position (start of pasted subsequence)
^              // identifies A, and can expand it this time

Because argument lists are expanded in a stacked context, and the cursor position is restored to the position in front of the original call for the rescan pass, placing a macro invocation in any macro's argument list - even one that actually does nothing, like EXPAND - gives it a "free" extra run over by the expansion cursor, by resetting the cursor's position in the stream an extra time for an extra rescan pass, and therefore giving each freshly constructed call expression (i.e. the pushing together of a macro name and a parenthesized argument list) an extra chance at being recognised by the expander. So all EVAL does is give you 363 (3^5+3^4+3^3+3^2+3, someone check my math) free rescan passes.

So, addressing the questions in light of this:

  • "painting blue" doesn't work quite like that (the explanation in the wiki is a bit misleadingly phrased, although it's not wrong). The name of a macro, if generated within that macro, will be painted blue permanently (C11 6.10.3.4 "[blue] tokens are no longer available for further replacement even if they are later (re)examined"). The point of DEFER is rather to ensure that the recursive invocation doesn't get generated on the macro's expansion pass, but instead is ...deferred... until an outer rescan step, at which point it won't get painted blue because we're no longer within that named macro. This is why REPEAT_INDIRECT is function-like; so that it can be prevented from expanding into anything mentioning the name of REPEAT, as long as we're still within the body of REPEAT. It requires at least one further free pass after the originating REPEAT completes to expand away the spacing EMPTY tokens.
  • yes, EXPAND forces an additional expansion pass. Any macro invocation grants one extra expansion pass to whatever expression was passed in its argument list.
    • the idea of DEFER is that you don't pass it a whole expression, just the "function" part; it inserts a spacer between the function and its argument list that costs one expansion pass to remove.
    • therefore the difference between EXPAND and DEFER is that DEFER imposes the need for an extra pass, before the function you pass it gets expanded; and EXPAND provides that extra pass. So they are each other's inverse (and applied together, as in the first example, are equivalent to a call using neither).
  • yes, OBSTRUCT costs two expansion passes before the function you pass it gets expanded. It does this by DEFERing the expansion of the EMPTY() spacer by one (EMPTY EMPTY() ()), burning the first cursor reset on getting rid of the nested EMPTY.
  • OBSTRUCT is needed around REPEAT_INDIRECT because WHEN expands (on true) to a call to EXPAND, which will burn away one layer of indirection while we're still within the call to REPEAT. If there was only one layer (a DEFER), the nested REPEAT would be generated while we're still in REPEAT's context, causing it to be painted blue and killing the recursion right there. Using two layers with OBSTRUCT means that the nested REPEAT won't be generated until we reach the rescan pass of whatever macro call is outside REPEAT, at which point we can safely generate the name again without it being painted blue, because it's been popped from the no-expand stack.

So, this method of recursion works by using an external source of a large number of rescan passes (EVAL), and deferring the expansion of a macro's name within itself by at least one rescan pass so it happens further down the call stack. The first expansion of the body of REPEAT happens during the rescan of EVAL[363], the recursive invocation is deferred until the rescan of EVAL[362], the nested expansion deferred... and so on. It's not true recursion, as it can't form an infinite loop, but instead relies on an external source of stack frames to burn through.