1
votes

I'm writing a function parser, that parses this kind of functions:

<fsignature>"(" <term1>, <term2> ... <termn>")"

Before checking if a string is accepted by the language, I transform it into a list to then send it to the DCG rules. I wrote this for that:

is_funct(A) :- term_to_atom(A, X), atom_chars(X, L), phrase(expr, L).

The thing is, the code is quite brutal and just chunks every character into the list, so if there's a signature or literal longer than one character, or if there are nested functions, it just doesn't work. The correct algorithm for chunking it would be, I think:

List L, Z;
String F;

for(int i=0; i<L.length; i++){
  if(L[i]=="(" || L[i]==","){
    for(int j=i; L[j]!=")"||L[j]!=","; j++)
      F+=L[j];
    Z.append(F);
    if(Z[j-1]==")") Z.append(")");
  }
}

Or, scroll through the list until you find a comma or open bracket, then scoll backwards and create a string with every character you find until you find another bracket or comma, and when you find either append the string to the list, and if you found a bracket append it too. So you'd get something like

[foo, "(", bar, fizz, "(", buzz, ")", hello, ")"]

To correctly then see if it's accepted by the language.

How exactly do I translate this algorithm recursively in Prolog? I tried envisioning a solution, that would be

1) Split term at every comma (,) and put splitted strings in L
2) Find index of every bracket for ever element of L and split on the brackets
3) Reinsert brackets into list at the indexes saved

But it doesn't sound the right way to do it, and it's not recursive either!

What would be the right way to parse the strings be? Thanks in advance!

1
Can you show us a somewhat more complete example of what you want? - Daniel Lyons
I'm trying to figure out a better solution in Prolog, but I find that really hard, and probably because I'm looking at the problem from the wrong point of view. - nass.hbs

1 Answers

2
votes

Here's how I'd approach your problem, based on what I think you're asking. Your question is heavy on technical details for what I think is the wrong approach, so I'm not entirely sure this is the right direction, but it's what I got.

First, I think your grammar really looks something like this:

function ::= name '(' termlist ')'.
termlist ::= [] | nonemptytermlist.
nonemptytermlist ::= term | term ',' nonemptytermlist.
term     ::= name
          |  function.
name     ::= [A-Za-z][A-Za-z0-9_-]*

We're doing Prolog here, so the most "declarative" reading of your problem you can come up with is the one you want to code. Only after that sucks do you want to try to optimize it. BNF grammars are so common in Prolog the language has built-in support for them: definite clause grammars.

There's recursion in this grammar, but that's not a huge problem in grammars as long as it's in the correct position, which is usually not the first or leftmost position. This is EBNF-ish, which should convert fairly readily to DCG notation.

function --> fname, "(", termlist, ")".
termlist --> [] | nonemptytermlist.
nonemptytermlist --> term | term, ",", nonemptytermlist.
term    --> fname | function.
fname    --> [C], { char_type(C, alpha) }, namebody.
namebody --> [C], { char_type(C, alnum) ; C = '_' ; C = '-' }, namebody.
namebody --> [].

This actually seems to work, but it isn't hugely useful:

?- atom_codes("foo(this,bar(that),another)", X), phrase(function, X).
X = [102, 111, 111, 40, 116, 104, 105, 115, 44|...] ;

It may not be obvious here, but it has successfully parsed this sentence with the function DCG rule. You're just not getting anything back. So the next thing to do is make your grammar rules build the structure you want.

function(F) -->
    fname(Name),
    "(", termlist(List), ")",
    { F =.. [Name|List] }.

termlist([])   --> [].
termlist(List) --> nonemptytermlist(List).

nonemptytermlist([X])    --> term(X).
nonemptytermlist([X|Xs]) --> term(X), ",", nonemptytermlist(Xs).

term(Term)     --> fname(Term).
term(Function) --> function(Function).

fname(Name)    --> [C], { char_type(C, alpha) }, namebody(Cs),
    { atom_codes(Name, [C|Cs]) }.

namebody([C|Cs]) -->
    [C],
    { char_type(C, alnum) ; C = '_' ; C = '-' },
    namebody(Cs).
namebody([]) --> [].

All we've done here is lightly restructure things and pass back through the DCG rule arguments values that got parsed by each rule. Now you can see this successfully parses complicated structures:

?- atom_codes("foo(this,bar(qwerty,uiop),that(),little())", X), phrase(function(F), X).
X = [102, 111, 111, 40, 116, 104, 105, 115, 44|...],
F = foo(this, bar(qwerty, uiop), that, little)

The grammar has successfully converted the string into a Prolog term. Unfortunately, Prolog doesn't see much point to foo() so those parentheses have been dropped. This is owing to the "univ" operator =.. which we are using to convert a list of function name and arguments into a Prolog structure. It may be the case that Prolog structures are not easy enough for you to process; in that case remove the "univ" step on function like so:

function([Name|List]) -->
    fname(Name),
    "(", termlist(List), ")".

Using it returns this:

?- atom_codes("foo(this,bar(qwerty,uiop),that(),little())", X), phrase(function(F), X).
X = [102, 111, 111, 40, 116, 104, 105, 115, 44|...],
F = [foo, this, [bar, qwerty, uiop], [that], [little]] ;
false.

You still can't distinguish terms from empty functions. You could fix that by making term//1 more explicit:

function(Name, Args) -->
    fname(Name),
    "(", termlist(Args), ")".
% ...
term(term(Term))           --> fname(Term).
term(function(Name, Args)) --> function(Name, Args).

The effect is a lot more verbose:

?- atom_codes("foo(this,bar(qwerty,uiop),that(),little())", X), phrase(function(F,A), X).
X = [102, 111, 111, 40, 116, 104, 105, 115, 44|...],
F = foo,
A = [term(this), function(bar, [term(qwerty), term(uiop)]), function(that, []), function(little, [])]

This may be easier or harder for you to process. My rule of thumb would be to try to keep as close to Prolog structure as you can, but this may cause discomfort.

Anyway, I hope this helps.