2
votes

I am using definite clause grammars to parse string literals in Prolog, but this grammar rule can only parse string literals that contain alphabetic characters:

string_literal(S) --> "\"", symbol(S), "\"".
symbol([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([])     --> [].
letter(Let)     --> [Let], { code_type(Let, alpha) }.

Is it possible to write a DCG rule that can parse string literals with other types of symbols?

1

1 Answers

0
votes

In SWI-Prolog, library(dcg/basics) has several ready to use non terminals. The code is worth to study...

Otherwise, to generalize a bit you could pass the code type to the matching, then combine the primitives at willing:

char(Type, C) --> [C], { code_type(C, Type) }.

letter(L) --> char(alpha, L).
digit(D) --> char(digit, D).
lower_or_num(C) --> char(lower, C) | digit(C).
...

a possibility, to skip over unwanted chars (only newline or single quotes)

string_literal(S) --> "\"", string_inner(S).

string_inner([]) --> "\"".
string_inner(Cs) --> [C],
    { ( C == 0'\n ; C == 0'' ) -> Cs = Rs ; Cs = [C|Rs] },
    string_inner(Rs).

edit

prevent it from matching strings that contain double quotes

the construct if -> then ; else fails if we omit the else branch, and the if is false, so an attempt could be:

...
{ ( C == 0'\n ; C == 0'' ) -> Cs = Rs ; C \== 0'" -> Cs = [C|Rs] },
...