2
votes

Playing around with DCGs and stubled upon the following problem:

I want to parse as well as produce an exact amount of spaces (" "). My trivial approach of simply doing this:

trivial_nat_space(0) --> [].
trivial_nat_space(N) -->
    { N > 0, N is M+1 },
    " ",
    trivial_nat_space(M).

failed terribly, because of insufficient instantiation of N and M depending on whether i do

?- String="   ", my_string_codes(String,Codes), phrase(trivial_nat_space(Anz_Spaces), Codes, [])

or

?- Anz_Spaces=3,my_string_codes(String,Codes), phrase(trivial_nat_space(Anz_Spaces), Codes, [])

where (for convenience)

my_string_codes(S,C) :-
    when((ground(S);ground(C)), string_codes(S,C)).

searching for a nice solution to the problem I made a version that depends on self defined nats:

z.
s(z).
s(s(O)) :-
  s(O).

nat_num(S,C) :-
    when((ground(S);ground(C)),nat_num_(S,C)).
nat_num_(z,0) :- !.
nat_num_(s(N),X) :-
    nonvar(X),
    !,
    X > 0,
    Y is X-1,
    nat_num_(N,Y).
nat_num_(s(N),X) :-
    var(X),
    nat_num_(N,Y),
    X is Y+1.

n_space(z) --> [].
n_space(s(N)) -->
    " ",
    n_space(N).

which I would like to avoid because the additional encoding of the natural number is kind of already present in the builtin numbers.

and this:

nat_space(0) --> [].
nat_space(N) -->
    { var(N) },
    " ",
    nat_space(M),
    { N is M+1 }.
nat_space(M) -->
    { nonvar(M), M>0 },
    " ",
    { N is M-1 },
    nat_space(N).

which does work fine:

?- Anz_Spaces=3,my_string_codes(String,Codes), once(phrase(nat_space(Anz_Spaces), Codes, [])).
Anz_Spaces = 3,
String = "   ",
Codes = [32, 32, 32].

?- String="   ",my_string_codes(String,Codes), once(phrase(nat_space(Anz_Spaces), Codes, [])).
String = "   ",
Codes = [32, 32, 32],
Anz_Spaces = 3.

However the encoding of nat_spaces is (in my opinion) far from nice: it depends on meta-predicates to enforce a specific execution sequence, and (more seriously): if the parser were more complex than just " ", the logic would have to be defined in a seperate DCG predicate/rule resulting in the logic for a single parser/generator to be split into two definitions (the separated one and the one enforcing the correct execution sequence).

Is this the canonical/standard way of solving problems like this or is there a more general, elegant solution that I am missing right now?

Additional Info:

I am using SWI-Prolog version 8.3.9 for x86_64-linux with :- [library(dcg/basics)] and no additional arguments when starting the runtime. Nor do I set any settings in the file with the definitions.

2
Having spent much time with doing DCGs in Prolog your question has a lot of open ended things that need to be clarified. 1. Which Prolog? 2. Which version? 3. What are all of the Prolog flag settings? 4. Which modules were loaded? - Guy Coder
Why is parser/generator to be split into two definitions such a bad idea? I spent years in pursuit of unified parser/generator and while I do know of some nice tricks to help, e.g. CLP, in most cases it is not worth the time or effort to peruse. I do use single code bases for combined parser/generator in some cases but those are the exception. - Guy Coder
If you continue on this path then what happens when you get to harder problems that out of the box CLP can not solve. Then you need CHR and your code becomes so advanced that only you understand it and can maintain it. - Guy Coder
Of interest: abnf.pl by Wouter Beek - Note this uses difference list which will not work with most DCG example code or libraries such as dcg_basics. - Guy Coder
Of interest: regex - Regular expression support for Prolog by Michael Hendricks - Guy Coder

2 Answers

4
votes

Frankly, your original definition doesn't fail that terribly. No, it does not fail. For the most general query, it produces one solution,

?- phrase(trivial_nat_space(0), Cs).
   Cs = []                                    % pure perfect logic!
;  false.
?- phrase(trivial_nat_space(-1), Cs).
false.                                        % it's right to be false!
?- phrase(trivial_nat_space(1), Cs).
caught: error(instantiation_error,(is)/2)     % er...
?- phrase(trivial_nat_space(N), Cs).          % most general query
   Cs = [], N = 0                             % again, pure logic 
;  caught: error(instantiation_error,(is)/2)  % mmh...

... and otherwise an instantiation error. Instantiation errors are not the worst that can happen. They clearly and honestly state that more information (= instantiations) must be provided before we can continue. That is much better than to pretend everything is fine when it is not. Think of a clerk who asks for more information as producing an instantiation error. And then compare this to one that just fills out your IRS forms with some bold default assumptions1.

To localize the reason for an instantiation error, we will use a . So I will throw in some false goals and also an additional instantiation to make it even easier:

trivial_nat_space(0) --> [], {false}.
trivial_nat_space(N) --> {N = 1},
    { N > 0, N is M+1, false },
    " ",
    trivial_nat_space(M).

?- phrase(trivial_nat_space(1), Cs).
caught: error(instantiation_error,(is)/2)

This is a pretty disfunctional program! And still it produces an instantiation error. In order to fix your original program we have to modify something in the remaining visible part. In N is M+1 only the M can cause that error. In fact, it occurs here for the first time. We can replace it by M is N-1 which improves your program:

?- phrase(trivial_nat_space(1), Cs).
   Cs = " "                                % see section double quotes
;  false.
?- phrase(trivial_nat_space(2), Cs).
   Cs = "  "
;  false.
?- phrase(trivial_nat_space(N), Cs).
   Cs = [], N = 0
;  caught: error(instantiation_error,(is)/2) % still ...
?- phrase(trivial_nat_space(N), " ").
caught: error(instantiation_error,(is)/2)

Our program now works at least when the concrete number of spaces is known. Even better, we can also use arithmetic expressions!

?- phrase(trivial_nat_space(4*1), Cs).    % let's indent by four spaces
   Cs = "    "
;  false.
?- phrase(trivial_nat_space(4*2), Cs).    % ... twice ...
   Cs = "        "
;  false.
?- phrase(trivial_nat_space(4*0), Cs).    % ... none?
false.
?- phrase(trivial_nat_space(0), Cs).
   Cs = []                                % here it works
;  false.

So N may be an arithmetic integer expression, and it works as expected, except for 0 which must be stated literally. That is not really a deep problem, no algebraic properties are violated. But elegant it is not. Let's remember that.

Back to the instantiation errors. To handle these cases as well we need some way to deal with this variable N. The easiest way is to use library(clpz) or its predecessor in SWI library(clpfd) as proposed in another answer. And yes, you can do such things manually, but thereby you are duplicating the work that has been invested into that library. It might make sense for performance reasons sometimes, but it will come at a hefty (bug ridden) price.

So let's look at @gusbro's solution and don't forget to add

:- use_module(library(clpz)).  % SICStus or Scryer
:- use_module(library(clpfd)). % SWI
?- phrase(trivial_nat_space(N), Cs).
   Cs = [], N = 0
;  Cs = " ", N = 1                     % finally, logic!
;  Cs = "  ", N = 2
;  Cs = "   ", N = 3
;  ...
?- phrase(trivial_nat_space(N), "  ").
   N = 2
;  false.
?- N = 1+1, phrase(trivial_nat_space(N), "  ").
   N = 1+1                               % nice like is/2
;  false.
?-          phrase(trivial_nat_space(N), "  "), N = 1+1.
false.                                   % and out, why?

Everything is nice and dandy, up to the last query. So that extension with arithmetic expressions did not work out so nicely. Effectively it boils down to the following problem:

?- N = 1+1, N #= 2.
   N = 1+1.
?-          N #= 2, N = 1+1.
false.

In the first query, we solve the integer-equation 1+1 #= 2 which succeeds, and in the second query, we solve N #= 2 which succeeds with N = 2 and then we try to solve 2 = 1+1 which fails.

In other words, that extension into general arithmetic expressions did not work so well for constraints. Before, instantiation errors hid the problem. Now we need to draw somehow the line. And violating commutativity as above is not a nice option2.

The solution is to separate expression variables and integer variables explicitly and insist on fully instantiated expressions.

?- N = 1+1, #N #= 2.
caught: error(type_error(integer,1+1),must_be/2)
?-          #N #= 2, N = 1+1.
false.
?- assertz(clpz:monotonic).
   true.
?-          N #= 2, N = 1+1.
caught: error(instantiation_error,instantiation_error(unknown(_102),1))

So now @gusbro's program gets some slight modification:

trivial_nat_space(0) --> [].
trivial_nat_space(N) -->
    { #N #> 0, #M #= #N-1 },
    " ",
    trivial_nat_space(M).

double_quotes

Since you want elegant code, consider to use as a single representation for text: lists of characters. In this manner you avoid all this converting code which will never be elegant. In some systems like Tau, Trealla, and Scryer, double quoted items are chars by default. In SWI proceed like so:

?- L = "ab", L = [_,_].
false.

?- phrase("ab","ab").
false.

?- set_prolog_flag(double_quotes, chars).
true.

?- L = "ab", L = [_,_].
L = [a, b].

?- phrase("ab","ab").
true.

And with library(double_quotes)

?- L = "ab", L = [_,_].
L = "ab".

Grammars

Finally, there is something to note about multi-directional grammar rules in general. Think of a predicate text_ast/2. For one Abstract Syntax Tree, there is an infinity of valid program texts which all differ by trivial paraphernalia like layout text. Therefore, this general relation must not terminate when only the AST is given. So you would need an extra parameter indicating whether the text should be canonical or not.



1 And in fact in DEC10 Prolog the default assumption for variables in arithmetic expressions was the value zero. ISO Prolog has defined instantiation errors for those situations.


2 In SICStus' native library clpfd, the same problem appears with ?- N = 1+1, call(N #= 2). instead.

3
votes

For your specific example, you can use CLP(fd) to be able to use the DCG in both ways:

trivial_nat_space(0) --> [].
trivial_nat_space(N) -->
    { N #> 0, M #= N-1 },
    " ",
    trivial_nat_space(M).

In the following sample runs I will use backticks (`) to use coded strings:

?- phrase(trivial_nat_space(Anz_Spaces), `   `, []).
Anz_Spaces = 3 ;
false.

?- phrase(trivial_nat_space(3), Spaces, []).
Spaces = [32, 32, 32] ;
false.

?- phrase(trivial_nat_space(N), Spaces, []).
N = 0,
Spaces = [] ;
N = 1,
Spaces = [32] ;
N = 2,
Spaces = [32, 32] ;
N = 3,
Spaces = [32, 32, 32] 
 ...