Prolog: Parsing a term for semantic correctness (pattern matching)

Question

I am working on an assignment and I am woefully stuck. I have little experience with prolog and I am learning on the job based on online material and a textbook.

The aim of the assignment is to produce a predicate, parse(), that is able to check that a given argument/term is of the following form, returning true or false as appropriate:

((int,int),[(int,float),(int,float)...])

So an int pair, followed by an n-sized list of (int,float) pairs, with both wrapped up in a tuple.

For example:

( (1, 7), [(1, 0.0), (2, 0.5), (3, 0.7), (4, 0.8), (5, 0.8), (6, 0.25), (7, 0.3)] )

Could anyone point me in the right direction for where to begin on this? I haven't ever done this sort of this in Prolog.

I am not looking for a full coded answer, naturally this is an assignment, but just some direction for how to approach this problem. I have no declarative programming background, so this is an educational exercise.

Thanks!

mat mat · Accepted Answer · 2017-04-19T07:38:00

We can start with the observation that what you show is already a valid Prolog term:

?- T = ( (1, 7), [(1, 0.0), (2, 0.5), (3, 0.7), (4, 0.8), (5, 0.8), (6, 0.25), (7, 0.3)] ) .
T =  ((1, 7), [(1, 0.0),  (2, 0.5),  (3, 0.7),  (4, 0.8),  (5, 0.8),  (6, 0.25),  (7, 0.3)]).

So, in such a case, there is no manual parsing necessary. By parsing, we typically mean the conversion of unstructured text, such as plain lists of characters or character codes, to a structured representation.

Prolog automatically rejects terms that are not syntactically valid. So, in your case, if you already have such a term, all it takes is to state what must hold about it so that it satisfies all constraints.

In Prolog, it is often useful to first think about building blocks you may want to use as a useful basis. For example, let us describe an (int,float) term:

integer_float_tuple((I,F)) :- integer(I), float(F).

We can use it as follows:

?- integer_float_tuple((3,4.5)).
true.

Note though:

First, the definition is not a true relation. For example, the most general query ?- integer_float_tuple(T). will fail even though there clearly are solutions!
Second, using terms of the form (A,B) is discouraged, also because (',')/2 is already used for so many other things in Prolog. It would be better to represent such pairs as I-F, using (-)/2 which is commonly used to denote pairs throughout Prolog libraries.

In any case, we can build on this definition with:

good_term((Tuple,Tuples)) :-
        Tuple = (A,B),
        integer(A), integer(B),
        maplist(integer_float_tuple, Tuple).

This states everything that must hold for such terms, and thus lets you validate them although, as noted, not generate them. To generalize such predicates to work in all directions, use your Prolog system's CLP(FD) and CLP(R) constraints. This will give you a definition that is monotonic and satisfies many desirable properties we expect from a logic program.

Further, if you really want to parse your data manually (as opposed to relying on Prolog's built-in notion of structured terms), I have two recommendations:

First, add the following directive to your program:

:- set_prolog_flag(double_quotes, chars).

This lets you conveniently write lists of characters with double quotes.

For example:

?- Cs = "abc".
Cs = [a, b, c].

Such a representation is very convenient for what we do next, which is using Prolog Definite Clause Grammars for parsing such lists of characters.

I won't solve the whole task for you, but I give you a start. We can start by describing what we mean by a digit:

digit(D) --> [D], { char_type(D, digit) }.

We use such a grammar with the phrase/2 interface predicate. For example:

?- phrase(digit(D), "5").
D = '5'.

Building on this definition, let us describe lists of digits:

digits([]) --> [].
digits([D|Ds]) --> digit(D), digits(Ds).

Example query:

?- phrase(digits(Ds), "5235").
Ds = ['5', '2', '3', '5'] ;
false.

Finally, let us combine all this to decsribe an integer:

integer(I) --> digit(D), digits(Ds), { number_chars(I, [D|Ds]) }.

With number_chars/2, we are performing the crucial conversion between characters and actual integers, which you need in your own representation.

Example query:

?- phrase(integer(I), "5235").
I = 5235 ;
false.

Using such building blocks, you can conveniently describe the general outline of such lists of characters by forther grammar elements.

Overall, your eventual grammar may look like:

pattern --> "(", whitespace, "(", integer, ",", integer, ")", ...

Where you need to supply definitions for whitespace//0 and complete the whole description.

See dcg for more information about this approach.

Prolog: Parsing a term for semantic correctness (pattern matching)

2 Answers