1
votes

I need to read the contents of a fixed-format text file into a list of lists (LL) in Prolog but I want to exclude the first and last element on each line from the list for that line. The very first line of the input file includes the number of rows (number of lists in LL) and columns (number of elements per list in LL). An example input file with 3 rows and 4 columns is

3 4
A B C D Cd
1 9 3   7   4   7
2 6 8   4   0   32
3 2 4   3   8   42
Ab 140  21  331 41 55

and I would like

LL = [[9,3,7,4],[6,8,4,0],[2,4,3,8]]

How can I exclude the first and last element on each line from LL?

I have tried reading the SWI-Prolog documentation and searching for relevant threads here, but I have been unsuccessful.

readAll( InStream, [W|L] ) :-
     readWordNumber( InStream, W ), !,
     readAll( InStream, L ).

readAll( InStream, [] ) :-
     \+readWordNumber(InStream,_).

lst_2_lst_of_lst([], _N, []).
lst_2_lst_of_lst(L, N, LL) :-
    lst_2_lst_of_lst_helper(L, 1, N, LL).
lst_2_lst_of_lst_helper([H|T], N, N, [[H]|LL]):-
    lst_2_lst_of_lst(T, N, LL). 
lst_2_lst_of_lst_helper([H|T], N1 , N, [[H|TMP]|LL]):-
    N2 is N1 + 1,
    lst_2_lst_of_lst_helper(T, N2 , N, [TMP| LL]).    

After calls to

...readAll(F,Input), ...
lst_2_lst_of_lst(Input, C, LL)

(C is 4, read in from first line of F, the text file)

My current result looks like this

LL = [[1,9  3   7  4 7,2,6   8   4 0 32],[3,2  4  3 8  42,Ab,140  21 331     41]]

and I would like it to look like this

LL = [[9,3,7,4],[6,8,4,0],[2,4,3,8]]
3
DCG is your friend. Perhaps an introduction is necessary. - Primer - Guy Coder

3 Answers

2
votes

I would separate the issue of parsing the file and cleanup lines. Let's say we have a predicate that actually captures lines of tokens. Then the following could be applied:

cleanup([_,_|Data],Clean) :-
    remove_last(Data,DataC),
    maplist([[_|L],C]>>remove_last(L,C),DataC,Clean).
remove_last(L,C) :-
    append(C,[_],L).

Capturing lines of tokens could be

readAll(InStream,[Line|Lines]) :-
    read_a_line(InStream,Line),
    readAll(InStream,Lines).
readAll(_InStream,[]).

read_a_line(F,L) :-
    read_line_to_string(F,S),
    S\=end_of_file,
    tokenize_atom(S,L).

To illustrate some of the IO facilities of SWI-Prolog, a quick test:

?- data(F),open_any(string(F),read,Stream,Close,[]),readAll(Stream,AllLines),cleanup(AllLines,Clean),Close.
F = "3 4\nA B C D Cd\n1 9 3   7   4   7\n2 6 8   4   0   32\n3 2 4   3   8   42\nAb 140  21  331 41 55",
Stream = <stream>(0x7f37b039e5d0),
Close = close(<stream>(0x7f37b039e5d0)),
AllLines = [[3, 4], ['A', 'B', 'C', 'D', 'Cd'], [1, 9, 3, 7, 4, 7], [2, 6, 8, 4, 0|...], [3, 2, 4, 3|...], ['Ab', 140, 21|...]],
Clean = [[9, 3, 7, 4], [6, 8, 4, 0], [2, 4, 3, 8]] 

where data(F) actually binds F to the string you have in your example file.

Without lambda, we need an 'use once' predicate: for instance

cleanup([_,_|Data],Clean) :-
    remove_last(Data,DataC),
    maplist(remove_first_and_last,DataC,Clean).
    %maplist([[_|L],C]>>remove_last(L,C),DataC,Clean).
remove_first_and_last([_|L],C) :-
    append(C,[_],L).
1
votes

Not absolutely certain that I understand your requirements. Your input looks a bit like tabular data, but it also looks a bit like a file format of some kind. Which one is it? How is it actually defined? What is the importance of the second row/line of your example input? Is "white space" the column separator? The questions can go on.

Here is how I will interpret your problem:

  • The first line of input has two integer values separated by whitespace; those are "row" and "column" count nrow and ncol.
  • The second row is not relevant (?).
  • Then, a number of rows with whitespace-separated columns with integers in them follow. For nrow lines, make a list nrow long:
    • skip the first column;
    • take the next ncol columns and put them in a list of integers.
  • skip the rest of input.

Writing this down is about 99% of the hard work (not saying it is hard, but for this problem, all of the "hardness" is in here).

Now you can go ahead and do the easy part: write the code. SWI-Prolog provides this great little library called dcg/basics. With it, I came up with this (in a hurry):

$ cat ignore.pl
:- use_module(library(dcg/basics)).

read_stuff_from_stream(Stuff, Stream) :-
    phrase_from_stream(stuff(Stuff), Stream).

stuff(LL) -->
    integer(Nrow), white, whites, integer(Ncol), blanks_to_nl, !,
    string_without("\n", _Skip_this_line), "\n",
    rows(Nrow, Ncol, LL),
    remainder(_Skip_the_rest).

rows(0, _, []) --> !.
rows(Nrow, Ncol, [R|Rows]) --> { succ(Nrow0, Nrow) },
    skip_column,
    cols(Ncol, R),
    string_without("\n", _Skip_rest_of_line), "\n", !,
    rows(Nrow0, Ncol, Rows).

skip_column --> nonblanks(_Skip_this_column), white, whites.

cols(0, []) --> !.
cols(Ncol, [C|Cols]) --> { succ(Ncol0, Ncol) },
    integer(C), white, whites, !,
    cols(Ncol0, Cols).

It isn't "clean" code but it is a starting point. It works for the example you gave it.

3 4
A B C D Cd
1 9 3   7   4   7
2 6 8   4   0   32
3 2 4   3   8   42
Ab 140  21  331 41 55
$ swipl -q
?- [ignore].
true.

?- setup_call_cleanup(open('example.txt', read, In), read_stuff_from_stream(Stuff, In), close(In)).
In = <stream>(0x55f44e03de50),
Stuff = [[9, 3, 7, 4], [6, 8, 4, 0], [2, 4, 3, 8]].

There is room for improvement in like 10 different directions. If you don't understand something, ask.

1
votes

Complete code using DCG.

:- use_module(library(dcg/basics), except([eos/2])).       

:- set_prolog_flag(double_quotes, codes).

parse(LL) -->
    size(Rows,Columns),
    header,
    rows(Rows,Columns,LL),
    footer.

size(Row,Columns) -->
    integer(Row),
    whites,
    integer(Columns),
    "\n".

header -->
    string_without("\n",_),
    "\n".

rows(Rows0,Columns,[Item|Items]) -->
    row(Columns,Item),
    { Rows is Rows0 - 1 },
    rows(Rows,Columns,Items).
rows(0,_Columns,[]) --> [].

row(Columns,Values) -->
    integer(_), % Ignore first value
    whites,
    values(Columns,Values),
    integer(_), % Ignore last value
    "\n".

values(Columns0,[Item|Items]) -->
    value(Item),
    { Columns is Columns0 - 1 },
    values(Columns,Items).
values(0,[]) --> [].

value(Item) -->
    integer(Item),
    whites.

footer -->
    rest_of_line, !.

rest_of_line -->
    [_],
    rest_of_line.
rest_of_line --> [].

readAll(LL) :-
    phrase_from_file(parse(LL),'C:/ll.dat').

Test cases

:- begin_tests(data).

test(1) :-
    Input = "\c
        3 4\n\c
        A B C D Cd\n\c
        1 9 3   7   4   7\n\c
        2 6 8   4   0   32\n\c
        3 2 4   3   8   42\n\c
        Ab 140  21  331 41 55\n\c
    ",
    string_codes(Input,Codes),
    DCG = parse(LL),
    phrase(DCG,Codes,Rest),
    assertion( LL == [[9,3,7,4],[6,8,4,0],[2,4,3,8]] ),
    assertion( Rest == [] ).

test(2) :-
    Input_path = 'C:/ll.dat',
    DCG = parse(LL),
    phrase_from_file(DCG,Input_path),
    assertion( LL == [[9,3,7,4],[6,8,4,0],[2,4,3,8]] ).

:- end_tests(data).

Example run of test cases

?- run_tests.
% PL-Unit: data .. done
% All 2 tests passed
true.

Example run

?- readAll(LL).
LL = [[9, 3, 7, 4], [6, 8, 4, 0], [2, 4, 3, 8]].

When ever you process list you should consider using DCG (Primer).

The data is processed as character codes so the values for unification also have to be characters codes. People don't easily read character codes so Prolog has an option to turn double-quoted items into list of character codes. In the code "abc" is translated during compilation/consulting into [97,98,99]. This is done with a Prolog flag.

:- set_prolog_flag(double_quotes, codes).

Since using DCG is so common there is a library of predefined common predicates in dcg/basics in a module.

SWI Prolog has unit test.

To make it easier to format the input data for reading with unit test \c is used.

The predicate that drives DCGs is phrase but it comes it two very common variations.

  1. phrase/2 is typically used when the data is not read from a file. I also find it useful when developing and testing DCGs because you can see the entire stream of values. When the data is processed as a list of character codes and the input is a string, you will typically find string_codes/2 used with phrase/2. This is demonstrated in test(1)

  2. phrase_from_file/2 is typically used when you have the DCG working and want to read the data directly from a file.


View unit test in SWI-Prolog debugger.

If you want to use the debugger with a test case using SWI-Prolog then you start the debugger with

?- gtrace.
true.

Then run a specific test

[trace]  ?- run_tests(data:1).