2
votes

My problem: apply a predicate to filter a list in parallel

I have a list, and I have a predicate. In practice, it's a long list and the predicate takes a while. I want to output just the elements of the list which satisfy the predicate.

I have access to lots of CPU cores, so I thought it would make sense to try to test the elements of the list in blocks in parallel.

Stuff I've tried:

[concurrent_]maplist doesn't work

Seems like it should be easy. concurrent_maplist seems like an easy concurrent version of maplist.

According to another answer on this site, maplist/3 should do exactly what I want. The docs for SWI's maplist/3 suggest maybe it doesn't behave as described in the answer, but in the comments the author of the answer suggests that is an issue with the docs and it should indeed work as expected.

It doesn't seem to work for me.

I've tested this as follows:

:- use_module(library(apply)).

f(A):-
   (A*A < 10),
   writeln(A).

 set(0,[]).
 set(N,T):- N2 is N-1, set(N2,T1), append(T1,[N],T).

This doesn't work:

?- set(4,L), concurrent_maplist(f, L, O).
ERROR: Undefined procedure: f/2
ERROR:   However, there are definitions for:
ERROR:         f/1

Same issue with plain maplist:

set(4,L), maplist(f, L, O).
ERROR: Undefined procedure: f/2
ERROR:   However, there are definitions for:
ERROR:         f/1

include works but isn't parallel

What does do what I want (not in parallel tho) is include:

?- set(11,L), include(f, L, O).
1
2
3
L = [1, 2, 3, 4, 5, 6, 7, 8, 9|...],
O = [1, 2, 3] .

concurrent [edit] runs! But is it parallel?

I've been trying to get it work using concurrent/3 partly following the example of another answer.

Loading this file:

set(0,[]):-!.
set(N,T):- N2 is N-1, set(N2,T1), append(T1,[N],T).
f(A, B):-
    (A*A < B),
    writeln(A).
par_test(A, B):-
    set(A, S),
    findall(f(Good, B), (member(Good, S)), Goals),
    concurrent(8, Goals, []).

...

?- set(4, S), findall(f(Good, 4), member(Good, S), Goals), concurrent(8, Goals, []).
1
false.

But watching htop (with a much longer-running predicate in place of this f), it seems to run on just one core :-(.

How do I do this in parallel?

Is there a way to achieve this with concurrent_maplist, or a similarly easy parallel version of include, or another way to achieve what I'm going for?

2

2 Answers

2
votes

let's test concurrent_maplist:

test_concurrent_maplist(C) :-
    numlist(1,C,Ns),
    concurrent_maplist(prime,Ns,Ps),
    sumlist(Ps,N),
    format('we have ~d primes in first ~d natural numbers~n', [N,C]).
test_maplist(C) :-
    numlist(1,C,Ns),
    maplist(prime,Ns,Ps),
    sumlist(Ps,N),
    format('we have ~d primes in first ~d natural numbers~n', [N,C]).

prime(N,1) :-
    prime(N), !.
prime(_,0).

% from https://en.wikipedia.org/wiki/Primality_test
prime(N) :- N =< 1, !, fail.
prime(N) :- N =< 3, !.
prime(N) :- (0 is N mod 2; 0 is N mod 3), !, fail.
prime(N) :- prime_(N,5).

prime_(N,I) :-
    I * I =< N, !,
    ( ( 0 is N mod I; 0 is N mod (I + 2) ) -> fail
    ; J is I + 1, prime_(N,J)
    ).
prime_(_,_).

on a 4 core machine, it yields

?- time(test_concurrent_maplist(100000)).
we have 9592 primes in first 100000 natural numbers
% 2,100,109 inferences, 1.799 CPU in 2.217 seconds (81% CPU, 1167205 Lips)
true.

?- time(test_concurrent_maplist(1000000)).
we have 78498 primes in first 1000000 natural numbers
% 21,000,109 inferences, 18.244 CPU in 36.764 seconds (50% CPU, 1151063 Lips)
true.

?- time(test_maplist(100000)).
we have 9592 primes in first 100000 natural numbers
% 16,151,437 inferences, 3.942 CPU in 3.951 seconds (100% CPU, 4096903 Lips)
true.

?- time(test_maplist(1000000)).
we have 78498 primes in first 1000000 natural numbers
% 402,953,287 inferences, 102.334 CPU in 102.302 seconds (100% CPU, 3937617 Lips)
true.

undoubtly, an improvement:

?- F1 is (2.217/3.951)*100, F2 is (36.764/102.334)*100.

with longer lists, we approach to 1/3 of elapsed time.

Instead of concurrent_maplist/3, we could stick to concurrent_maplist/2 and store results either in database,global variables, etc...

0
votes

maplist takes the first argument as a predicate and applies as an argument to this predicate one element from each list given as arguments to maplist. This necessarily means all of the list arguments to maplist have the same number of elements. In this case f accepts one argument, but maplist(f, _, _) expects f to accept 2 arguments. Thus the error, Undefined procedure: f/2 which means Prolog cannot find f with 2 arguments.

So maplist doesn't really do what you want here. You want something that is called select in some other languages in which you are only including elements in the second list that pass a filter applied to elements in the first.

Here's a sample implementation:

select(_, [], []).
select(Filter, [R|Rs], Fs) :-
    (   call(Filter, R)
    ->  Fs = [R|SF]
    ;   Fs = SF
    ),
    select(Filter, Rs, SF).

And call, for example, select(f, L, O).

If, as in your example, you have any filters with the property that they're going to succeed up to a point in the list, then continue to fail thereafter, you'd probably want to optimize the filter as such and not have it continue traversing the list after the first failure.

select_until_fail(_, [], []).
select_until_fail(Filter, [R|Rs], Fs) :-
    (   call(Filter, R)
    ->  Fs = [R|SF],
        select_until_fail(Filter, Rs, SF)
    ;   Fs = []
    ).

Or something like that (untested).

After all that, I probably didn't really answer the "concurrency" part of the question.