You are in the right direction, but it is useful to learn some of the correct terminology. Knowing the terminology will help you understand and use the text books which contain much more helpful information.
In the rule TYPE
the name ID
would be known as a terminal symbol, whereas TYPE
would be called a non-terminal symbol of the grammar. A grammar is the set of rules used to describe the language. Each grammar rule defines a non-terminal symbol. Ultimately each non-terminal will be described as an arrangement of terminal symbols in a unique (non ambiguous) arrangement.
A terminal symbol is represented by a token which stands for whatever is its concrete representation. A concrete representation is what is actually typed on the keyboard. The sequence of characters that make up the concrete representation is known as a lexeme. The matching of the lexemes to create the tokens is the task performed by the lexical analyser (or scanner). The matching of the sequence of tokens to the non-terminal in the grammar rules is known as parsing.
Flex is the tool that produces the lexical analyser and bison is the tool that produces the parser.
So, TYPE
is not a token but a non-terminal symbol (otherwise known a grammar rule name). It is not "produced" by flex or bison, but by the writer of the grammar. The parser produced by bison will reduce the sequences of terminals and non-terminals to the non-terminal called TYPE
.
To avoid confusion between tokens, terminals, non-terminals and rules there is an unwritten convention that tokens or terminal symbols are written either in CAPS or as character constants. For example:
ID, '<'
Non-terminals or syntax rules are written in lower case to avoid the confusion with the former. For example:
type, expression
So, an experienced user of bison might write:
type_signature : ID
| ID '<' type_signature '>'
;
That way the nature of each name is much clearer.
Now to discuss whitespace. No, flex and bison do not ignore (automatically or otherwise) whitespace, tabs, newlines, carriage returns and other non visible characters, but it's complicated (as is your question).
There are the whitespace characters that occur in the concrete representation handled by the lexical analyser, and there are the whitespace characters that occur in the sets of rules that describe the language processing to the tools flex and bison. A language that you wish to process may contain syntactically (or even semantically) significant whitespace characters (an extreme example is the language called WhiteSpace). It is possible to write the parser for that language using flex and bison, so it cannot ignore all whitespace in its input description file or the input langauge! Rather than go into too much details here, it is just worth noting that in certain places whitespace is ignored and in other places it is particularly significant; until one is more experienced you should be careful. Whitespace is more sensitive in the lexer rules file than the parser rules file.