I want to write a DCG that can deal with text that is presented in notepad. I've read some online tutorial about writing DCG but none of which delt with a text that is free and that involves strings,dates and integers. I'm not really sure how to even start writing the DCG (how to represent a line or even a date). Any help?
1 Answers
the 'trick' it's to approach the problem in a declarative way, giving first more selective patterns. Your data format has an apparently well defined columnar structure, and using library(dcg/basics) could be handled:
:- [library(dcg/basics)].
row([Date,Key|Numerics]) -->
date(Date), separe, key(Key), separe, numeric_pairs(Numerics).
date(D/M/Y) -->
integer(D), "/", integer(M), "/", integer(Y).
key([F|Ks]) -->
[F], {F \= 0' }, string(Ks).
numeric_pairs([Num:Perc|NPs]) -->
integer(Num), separe, number(Perc), "%", whites, !, numeric_pairs(NPs).
numeric_pairs([]) --> [].
separe --> white, whites.
test:
?- atom_codes('02/18/2014 BATS Z 235122734 6.90% 109183482 10.50% 147587409 7.80%', Cs), phrase(row(R), Cs).
Cs = [48, 50, 47, 49, 56, 47, 50, 48, 49|...],
R = [2/18/2014, [66, 65, 84, 83, 32, 90], 235122734:6.9, 109183482:10.5, 147587409:7.8]
I must say that isn't very easy to debug. When Prolog backtracks you have no hint about what was going wrong... There should be a specialized trace, I guess...
To feed the DCG, see library(pure_input), or - easier to debug - fetch a line at time, with read_line_to_codes/2
edit maybe my hit to use read_line_to_codes/2 was a bad one.
Here is a complete scan of your test data, using phrase_from_file/2 and a subsequent selection of appropriate columns and sum (column as required by argument).
:- [library(dcg/basics)].
:- [library(pure_input)].
test(ColToSum, Tot) :-
phrase_from_file(row(Rows), '/tmp/test.txt'),
maplist(get_col(ColToSum), Rows, Cols),
sum_list(Cols, Tot).
get_col(ColToSum, Row, Col) :-
nth1(ColToSum, Row, Col:_).
row([[Date,Key|Numerics]|Rows]) -->
date(Date), separe, key(Key), separe, numeric_pairs(Numerics), "\n",
row(Rows).
row(Rows) -->
string(_), "\n",
row(Rows).
row([]) --> [].
date(D/M/Y) -->
integer(D), "/", integer(M), "/", integer(Y).
key([F|Ks]) -->
[F], {F \= 0' }, string(Ks).
numeric_pairs([Num:Perc|NPs]) -->
integer(Num), separe, number(Perc), "%", whites, !, numeric_pairs(NPs).
numeric_pairs([]) --> [].
separe --> white, whites.
that yields
?- test(3,X).
X = 561877153
If you're using Windows, use "\r\n" as line terminator...
HTH