5
votes

What are FIRST and FOLLOW sets? What are they used for in parsing? Are they used for top-down or bottom-up parsers?

Can anyone explain me FIRST and FOLLOW SETS for the following set of grammar rules:

E := E+T | T
T := T*V | T
V := <id>
3
Does this answer your question? Purpose of FIRST and FOLLOW sets in LL(1) parsers?ggorlen

3 Answers

3
votes

They are typically used in LL (top-down) parsers to check if the running parser would encounter any situation where there is more than one way to continue parsing.

If you have the alternative A | B and also have FIRST(A) = {"a"} and FIRST(B) = {"b", "a"} then you would have a FIRST/FIRST conflict because when "a" comes next in the input you wouldn't know whether to expand A or B. (Assuming lookahead is 1).

On the other side if you have a Nonterminal that is nullable like AOpt: ("a")? then you have to make sure that FOLLOW(AOpt) doesn't contain "a" because otherwise you wouldn't know if to expand AOpt or not like here: S: AOpt "a" Either S or AOpt could consume "a" which gives us a FIRST/FOLLOW conflict.

FIRST sets can also be used during the parsing process for performance reasons. If you have a nullable nonterminal NullableNt you can expand it in order to see if it can consume anything, or it may be faster to check if FIRST(NullableNt) contains the next token and if not simply ignore it (backtracking vs predictive parsing). Another performance improvement would be to additionally provide the lexical scanner with the current FIRST set, so the scanner does not try all possible terminals but only those that are currently allowed by the context. This conflicts with reserved terminals but those are not always needed.

Bottom up parsers have different kinds of conflicts namely Reduce/Reduce and Shift/Reduce. They also use item sets to detect conflicts and not FIRST,FOLLOW.

Your grammar would't work with LL-parsers because it contains left recursion. But the FIRST sets for E, T and V would be {id} (assuming your T := T*V | T is meant to be T := T*V | V).

1
votes

Answer :

E->E+T|T

left recursion

E->TE'

E'->+TE'|eipsilon

T->T*V|T

left recursion

T->VT'

T'->*VT'|epsilon

no left recursion in

V->(id)

Therefore the grammar is:

E->TE'

E'->+TE'|epsilon

T->VT'

T'->*VT'|epsilon

V-> (id)

FIRST(E)={(}

FIRST(E')={+,epsilon}

FIRST(T)={(}

FIRST(T')={*,epsilon}

FIRST(V)={(}

Starting Symbol=FOLLOW(E)={$}

E->TE',E'->TE'|epsilon:FOLLOW(E')=FOLLOW(E)={$}

E->TE',E'->+TE'|epsilon:FOLLOW(T)=FIRST(E')={+,$}

T->VT',T'->*VT'|epsilon:FOLLOW(T')=FOLLOW(T)={+,$}

T->VT',T->*VT'|epsilon:FOLLOW(V)=FIRST(T)={ *,epsilon}

Rules for First Sets

If X is a terminal then First(X) is just X!
If there is a Production X → ε then add ε to first(X)
If there is a Production X → Y1Y2..Yk then add first(Y1Y2..Yk) to first(X)
First(Y1Y2..Yk) is either
    First(Y1) (if First(Y1) doesn't contain ε)
    OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is everything in First(Y1) except for ε  as well as everything in First(Y2..Yk)
    If First(Y1) First(Y2)..First(Yk) all contain ε then add ε to First(Y1Y2..Yk) as well.

Rules for Follow Sets

First put $ (the end of input marker) in Follow(S) (S is the start symbol)
If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).
If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)
0
votes

Wikipedia is your friend. See discussion of LL parsers and first/follow sets.

Fundamentally they are used as the basic for parser construction, e.g., as part of parser generators. You can also use them to reason about properties of grammars, but most people don't have much of a need to do this.