3
votes

Currently I am playing with DCG in Prolog to parse an XML file. I was able to get the following code snippet which is able to parse a simple XML such as:

<author> <name> <f> arthur </f>
        <m> conan </m>
        <l> doyle </l>
     </name>
     <bday> <d> 22 </d>
        <m> 5  </m>
        <y> 1859 </y>
     </bday>
</author>

<author> <name> <f> william </f>
        <l> shakespeare </l>
     </name>
     <bday> <d> 23 </d>
        <m> 4  </m>
        <y> 1564 </y>
     </bday>
</author>
$

and the DCG is defined as:

xml([E]) --> element(E).
xml([E|L]) --> element(E), xml(L).

element(E) -->  begintag(N), elements(L), endtag(N), {E =.. [N|L]}.

elements(L) --> xml(L).
elements([E]) --> [E].

begintag(N) --> ['<', N, '>'].
endtag(N) -->   ['<', '/', N, '>'].

Would someone please illustrate how the DCG works in this case? I am having a really hard time trying to understand the arguments in the DCG (e.g. [E] in xml([E]); [E|L] in xml([E|L]). ) Thank you!

1

1 Answers

4
votes

Think declaratively: A DCG always describes a list. In a DCG body, comma (",") is read as "and then". So for example, xml//1 describes (by its first rule) either a single element, or (by its second rule) an element and then something that is again described by xml//1. element//1 is a begintag, then elements, then an endtag etc. The arguments used in the DCG heads let you relate the sequence that is described by the DCG bodies to other information, like a subsequence you are particularly interested in. Take for example begintag//1 - you can ask:

?- phrase(begintag(T), [<, test, >]).
T = test.

and also in the other direction:

?- phrase(begintag(test), Ls).
Ls = [<, test, >].

In the case of begintag//1, the tag name seems to be important to you, so you have introduced an argument for it that, on the one hand, lets you generate an opening tag with a given name, and, on the other hand, lets you parse a tag and extract its name, and, on the third hand, ask even the most general query:

?- phrase(begintag(T), Ls).
Ls = [<, T, >].

which abstractly relates an opening XML tag to its name.