Revised question
The value on the Yacc stack is controlled by YYSTYPE or %union
. Use YYSTYPE when the type information is simple; use %union
when it is complex.
One of my grammars contains:
struct Token
{
int toktype;
char *start;
char *end;
};
typedef struct Token Token;
#define YYSTYPE Token
For a variety of reasons (not necessarily good ones), my grammar uses a hand-crafted lexical analyzer instead of Lex.
In the grammar rules, you refer to items like NAME in your example as $1
(where the actual number depends on where the token appears in the list of tokens or terminals that make up the rule).
For example (same grammar):
disconnect
: K_DISCONNECT K_CURRENT
{ conn->ctype = CONN_CURRENT; }
| K_DISCONNECT K_ALL
{ conn->ctype = CONN_ALL; }
| K_DISCONNECT K_DEFAULT
{ conn->ctype = CONN_DEFAULT; }
| K_DISCONNECT string
{ conn->ctype = CONN_STRING;
set_connection(conn, $2.start, $2.end);
}
;
And:
load
: K_LOAD K_FROM opt_file_pipe string load_opt_list K_INSERT
{
set_string("load file", load->file, sizeof(load->file),
$4.start, $4.end);
load->stmt = $6.start;
}
;
I don't know whether seeing the outline of the hand-crafted yylex()
helps; in the grammar, it is a function in the same file as yyparse()
.
static const char *c_token; /* Where to start next token search */
static int yylex(void)
{
char buffer[MAX_LEXTOKENLENGTH];
const char *start;
if (c_token == 0)
abort();
if (bare_filename_ok)
start = scan_for_filename(c_token, &c_token);
else
start = sqltoken(c_token, &c_token);
yylval.start = CONST_CAST(char *, start);
yylval.end = CONST_CAST(char *, c_token);
if (*start == '\0')
{
yylval.toktype = 0;
return yylval.toktype;
}
set_token(buffer, sizeof(buffer), start, c_token);
#ifdef YYDEBUG
if (YYDEBUGVAR > 1)
printf("yylex(): token = %s\n", buffer);
#endif /* YYDEBUG */
/* printf("yylex(): token = %s\n", buffer); */
if (isalpha((unsigned char)buffer[0]) || buffer[0] == '_')
{
Keyword kw;
Keyword *p;
kw.keyword = buffer;
p = (Keyword *)bsearch(&kw, keylist, DIM(keylist), sizeof(Keyword),
kw_compare); /*=C++=*/
if (p == 0)
yylval.toktype = S_IDENTIFIER;
else
yylval.toktype = p->token;
}
else if (buffer[0] == '\'')
{
yylval.toktype = S_SQSTRING;
}
else if (buffer[0] == '"')
{
yylval.toktype = S_DQSTRING;
}
else if (isdigit((unsigned char)buffer[0]))
{
yylval.toktype = S_NUMBER;
}
else if (buffer[0] == '.' && isdigit((unsigned char)buffer[1]))
{
yylval.toktype = S_NUMBER;
}
...various single-character symbols recognized...
else if (buffer[0] == ':')
{
assert(buffer[1] == '\0');
yylval.toktype = C_COLON;
}
else
{
yylval.toktype = S_ERROR;
}
return yylval.toktype;
}
Original question
The variable is normally a global variable - your Yacc code uses one of two possible declarations:
extern char *yytext; /* Correct for Flex */
extern char yytext[]; /* Correct for traditional Lex */
Which of those is correct depends on how your version of Lex defines it.
If you want to add a length (perhaps yytextlen
), then you can define such a variable and have every return from yylex()
ensure that yytextlen
is set. Alternatively, you can arrange for your grammar to call wwlex()
, and your wwlex()
simply does:
int wwlex(void)
{
int rc = yylex();
yytextlen = strlen(yytext);
return rc;
}
Or you can arrange for Lex to generate code with the rename, and have Yacc continue to call yylex()
and you provide the code above as yylex()
and have it call the renamed Lex function. Either way works.
yyleng
instead ofyytext
). – Jonathan Leffler