3
votes

I want to write a DCG that can deal with text that is presented in notepad. I've read some online tutorial about writing DCG but none of which delt with a text that is free and that involves strings,dates and integers. I'm not really sure how to even start writing the DCG (how to represent a line or even a date). Any help?

1
Try to look at the code in this library from SWI-Prolog: swi-prolog.org/pldoc/doc/swi/library/dcg/basics.pl?show=srcuser1812457
I have the it in a csv but I wnated to check the DCG before extracting the data from a CSV.user2512231
I want the DCG to parse each row and then I can extract the coloumn of data I need. For example if i want to sum all the values of coloumn Tape Auser2512231

1 Answers

2
votes

the 'trick' it's to approach the problem in a declarative way, giving first more selective patterns. Your data format has an apparently well defined columnar structure, and using library(dcg/basics) could be handled:

:- [library(dcg/basics)].

row([Date,Key|Numerics]) -->
 date(Date), separe, key(Key), separe, numeric_pairs(Numerics).

date(D/M/Y) -->
 integer(D), "/", integer(M), "/", integer(Y).

key([F|Ks]) -->
 [F], {F \= 0' }, string(Ks).

numeric_pairs([Num:Perc|NPs]) -->
 integer(Num), separe, number(Perc), "%", whites, !, numeric_pairs(NPs).
numeric_pairs([]) --> [].

separe --> white, whites.

test:

?- atom_codes('02/18/2014  BATS Z  235122734   6.90%   109183482   10.50%  147587409   7.80%', Cs), phrase(row(R), Cs).
Cs = [48, 50, 47, 49, 56, 47, 50, 48, 49|...],
R = [2/18/2014, [66, 65, 84, 83, 32, 90], 235122734:6.9, 109183482:10.5, 147587409:7.8] 

I must say that isn't very easy to debug. When Prolog backtracks you have no hint about what was going wrong... There should be a specialized trace, I guess...

To feed the DCG, see library(pure_input), or - easier to debug - fetch a line at time, with read_line_to_codes/2

edit maybe my hit to use read_line_to_codes/2 was a bad one.

Here is a complete scan of your test data, using phrase_from_file/2 and a subsequent selection of appropriate columns and sum (column as required by argument).

:- [library(dcg/basics)].
:- [library(pure_input)].

test(ColToSum, Tot) :-
    phrase_from_file(row(Rows), '/tmp/test.txt'),
    maplist(get_col(ColToSum), Rows, Cols),
    sum_list(Cols, Tot).

get_col(ColToSum, Row, Col) :-
    nth1(ColToSum, Row, Col:_).

row([[Date,Key|Numerics]|Rows]) -->
 date(Date), separe, key(Key), separe, numeric_pairs(Numerics), "\n",
 row(Rows).
row(Rows) -->
 string(_), "\n",
 row(Rows).
row([]) --> [].

date(D/M/Y) -->
 integer(D), "/", integer(M), "/", integer(Y).

key([F|Ks]) -->
 [F], {F \= 0' }, string(Ks).

numeric_pairs([Num:Perc|NPs]) -->
 integer(Num), separe, number(Perc), "%", whites, !, numeric_pairs(NPs).
numeric_pairs([]) --> [].

separe --> white, whites.

that yields

?- test(3,X).
X = 561877153 

If you're using Windows, use "\r\n" as line terminator...

HTH