How Should We Interpret a Macro with an Embedded Comma

Question

How should we interpret the following macro definition using the C++ standard? Notice the main issue is that replacement-list for AA contains embedded comma (for, S)

#define AA for, S    //<---note the embedded comma
#define VALUE_TO_STRING(x) ^x!
#define VALUE(x) VALUE_TO_STRING(x)

int _tmain(int argc, _TCHAR* argv[])
{
    VALUE(AA)
    return 0;
}

I've done a test with VC++2010 and the final result of the above looks like the following without any error but I've problem interpreting the steps that it took to come up with the result using C++03 (or C++11) standard:

int wmain(int argc, _TCHAR* argv[])
{
    ^for, S!
    return 0;
}

I've done some step by step tests with VC++2010. First I commented out the 2nd macro to see what was happening in the first step:

#define AA for, S
//#define VALUE_TO_STRING(x) ^x!
#define VALUE(x) VALUE_TO_STRING(x)

The macro replacement is straight forward and yielded a sequence that looks like another function-like macro having TWO arguments:

int wmain(int argc, _TCHAR* argv[])
{
    VALUE_TO_STRING(for, S)
    return 0;
}

According to [cpp.rescan] the next step is to re-scan this for more macro names. The question here is should this new macro be interpreted as a function-like macro with 2 arguments or 1 argument "for, S".

The normal interpretation is to consider VALUE_TO_STRING() is given 2 arguments which is invalid and hence a preprocessor error is resulted. But how come the VC++ came up with a result without any error? Obviously, the second step VC++ took was to consider the for, S as 1 single argument which doesn't make sense and isn't defined by the C++ standard.

I did not downvote. I just now upvoted this question, which in my opinion is quite hard, but clear and useful. — Yunnosch

H Walters H Walters · Accepted Answer · 2017-03-28T05:45:08

I've done a test with VC++2010...

MS's preprocessor was never made standard. They phrase it this odd way:

C99 __func__ and Preprocessor Rules ... For C99 preprocessor rules, "Partial" is listed because variadic macros are supported.

In other words, "we support variadic macros; therefore we qualify as partially compliant". AFAIK standard compliance for the preprocessor is considered very low priority by the MS team. So I wouldn't tend to use VC or VC++ as a model of the standard preprocessor. gcc's a better model of the standard preprocessor here.

Since this is about the preprocessor I'm going to focus the story on just this snippet:

#define AA for, S
#define VALUE_TO_STRING(x) ^x!
#define VALUE(x) VALUE_TO_STRING(x)
VALUE(AA)

I'll be referencing ISO-14882 2011 here, which uses different numbers than 1998/2003. Using those numbers, here's what happens starting at the expansion step, step by step... except for steps not relevant here which I'll skip.

The preprocessor sees VALUE(AA), which is a function-like invocation of a previously defined function-like macro. So the first thing it does is argument identification, referencing 16.3 paragraph 4:

[if not variadic] the number of arguments (including those arguments consisting of no preprocessing tokens) in an invocation of a function-like macro shall equal the number of parameters in the macro definition

...and a portion of 16.3.1 paragraph 1:

After the arguments for the invocation of a function-like macro have been identified,

At this step, the preprocessor identifies that there is indeed one argument, that the macro was defined with one argument, and that the parameter x matches the invocation argument AA. So far, argument matching and x is AA is all that happened.

Then we get to the next step, which is argument expansion. With respect to this step, the only thing about the replacement list that really matters is where the parameters are in it, and whether or not the parameters are part of stringification (# x) or pasting (x ## ... or ... ## x). If there are arguments in the replacement list that are neither, then those arguments are expanded (stringified or pasted versions of the arguments don't count during this step). This expansion happens first, before anything else interesting goes on in the invocation, and it occurs just as if the preprocessor were only expanding the invocation parameter.

In this case, the replacement list is VALUE_TO_STRING(x). Again, VALUE_TO_STRING might be a function-like macro, but since we're doing argument expansion right now we really don't care. The only thing we care about is that x is there, and it's not being stringified or pasted. x is being invoked with AA, so the preprocessor evaluates AA as if AA were on a line instead of VALUE(AA). AA is an object-like macro that expands to for, S. So the replacement list transforms into VALUE_TO_STRING(for, S).

This is the rest of 16.3.1 paragraph 1 in action:

A parameter in the replacement list, unless [stringified or pasted] is replaced by the corresponding argument after all macros contained therein have been expanded [...] as if they formed the rest of the preprocessing file

So far so good. But now we reach the next part, in 16.3.4:

After all parameters in the replacement list have been substituted and [stuff not happening here] the resulting preprocessing token sequence is rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.

This part evaluates VALUE_TO_STRING(for, S), as if that were the preprocessing token set (except that it also temporarily forgets that VALUE is a macro per 16.3.4p2, but that doesn't come into play here). That evaluation recognizes VALUE_TO_STRING as a function-like macro, being invoked like one, so argument identification begins again. Only here, VALUE_TO_STRING was defined to take one argument, but is invoked with two. That fails 16.3 p 4.

How Should We Interpret a Macro with an Embedded Comma

2 Answers