Parallelizing Aleph on Quantum Prolog
Review of existing Aleph parallelization approaches
From a cursory look at the console output written by Aleph, we can conclude that the vast majority of execution time is spent during the verification phase, where hypothesized clauses to predict hexose bindings sites generated by Aleph are quantitatively assessed with respect to whether they're verified or falsified by the given examples, respectively.
For the problem of speeding up verification using parallel execution, we can draw from a relative wealth of existing approaches, which we're briefly reviewing here:
Among those, swi-npt, by the author of SWI Prolog, and used as a showcase for introducing SWI Prolog's threading implementation, stands out. The approach taken, though, isn't directly usable for us here, since verification is weakened to a perform a randomized search split accross multiple threads rather than complete coverage testing. Note the required source adaptions versus Aleph version 4 weren't published, and it's not entirely clear those would be able to run succesfully on up-to-versions of SWI Prolog (swi-threading-1).
adpvoa presents a relatively simple parallelization
of the coverage test phase of induce/0
based on Aleph version
3 in a shared-nothing cluster of Prolog engines, with however
disappointing results.
add-pilp is said to be very similar to adpvoa in pilp-strategies-survey although it doesn't use Aleph.
dtp-ilp-mapreduce, by the author of Aleph, can be best compared to the approach we're describing next, although it involves employing multiple physical machines running isolated Prolog execution engines under an external Map/Reduce framework aimed at big data workloads.
A concurrent maplist implementation of prove/6
Not mentioned in any of the above approaches, Aleph's
prove/6
predicate appears to offer a very straightforward
opportunity for parallelization, since all it does is
to pick an interval of example indices for calling index_prove/6
on,
and calling itself recursively for the remainder list of intervals.
For collecting results, additional housekeeping such as
appending lists and adding counts is performed, but no data
is flowing into subsequent invocations from preceding ones.
Moreover, Aleph's stock prove/6
implementation is already
augmented with a tail-recursive prove/6
variant, hinting at
previous attempts to optimize example verification, albeit in
a serial execution environment.
Hence, prove/6
invites rewriting into a Map/Reduce
style algorithm iterating over a list of interval lists
using eg. maplist/N
, and then executing prove/6
using
a parallelized variant of maplist/N
.
maplist/N
has been part of Prolog implementations as old
as DEC-10 Prolog, and has been a candidate for inclusion into
ISO Prolog for some time (prologue-maplist).
Regular maplist/N
takes a callable term given in its first
argument, assembles effective callables by appending additional
arguments from elements of lists given in its second and
subsequent arguments, and invokes the so-obtained callables
invidually. For example,
maplist(f(1), [ 10, 20, 30], [ X, Y, Z])
is equivalent to invoking
call(f(1, 10, X)),
call(f(1, 20, Y)),
call(f(1, 30, Z)).
A regular (non-concurrent) implementation of maplist/N
exposes
a number of corner cases due to the fact that individual invocations
are executed in sequence, potentially transporting
bound variables from preceding invocations into subsequent ones.
Concurrent implementations of maplist/N
, on the other hand,
execute individual callables in no particular order and limit the
variable dependencies that can occur across individual invocations.
We now take a look at the stock Aleph prove/6
clauses:
% regular prove/6
prove(_,_,_,[],[],0).
prove(Flags,Type,Clause,[Interval|Intervals],IList,Count):-
index_prove(Flags,Type,Clause,Interval,I1,C1),
prove(Flags,Type,Clause,Intervals,I2,C2),
aleph_append(I2,I1,IList),
Count is C1 + C2.
and note we can replace those trivially by
% prove/6 using maplist/N
prove(Flags,Type,Clause,Intervals,IList,Count):-
maplist(index_prove(Flags,Type,Clause),Intervals,Is,Cs),
prove6_flatten(Is, IList),
prove6_sumlist(Cs, Count).
We additionally supply the following auxiliary predicates:
% auxiliary predicate to sum a list of numbers
prove6_sumlist([], 0).
prove6_sumlist([H|T], N) :- prove6_sumlist(T, M), N is H + M.
% from ECLiPSe's flatten/2 to flatten ResultIntervals list
% (actually, only a single-level flatten/2 predicate is needed)
prove6_flatten(List, Flat) :-
prove6_flatten(List, Flat, []).
prove6_flatten([], Res, Res) :- !.
prove6_flatten([Head|Tail], Res, Cont) :-
!,
prove6_flatten(Head, Res, Cont1),
prove6_flatten(Tail, Cont1, Cont).
prove6_flatten(Term, [Term|Cont], Cont).
And we also make sure that the potential call to retract/1
and asserta/1
descending from calls to index_prove/4
, as
called from prove/6
and used to support caching in Aleph,
is always switched off, since we of course can't expect
to obtain useful results when executing code mutating the
global database in parallel, so we just put the following code
lines of the index_prove1/9
into comments:
%(Caching = true ->
% (retract('$aleph_local'(example_cache,L)) ->
% asserta('$aleph_local'(example_cache,[Num|L]));
% asserta('$aleph_local'(example_cache,[Num])));
% true),
Performance report
Note the potential for actual optimization is constrained
because the interval lists generated by Aleph are far from
being distributed equally. In fact, if we were to stop
Aleph after the first iteration of induce/0
, like we did
in part 1 for getting to run Aleph on ISO Prolog at all,
we'd just observe Aleph submitting the single interval
1 - 80
to prove/6
, representing the request to verify
examples with index 1 through 80, and since only a single interval
is submitted, not giving rise to parallelizing
anything. On the other hand, if we look at further iterations,
we notice that the interval list argument supplied to
prove/6
can contain up to 20 interval lists. Aleph
collapses interval lists according to whether the interval
represents a contiguous lists of example indices satisfying
the hypothesis, but has no way to further split those
into subinterval lists so as to make intervals equal in
size. Balancing interval lists to obtain parallel workloads
roughly taking equal amounts of time to peform thus is another
easy optimization with almost-guaranteed benefits that could
be employed here.
Still, even with our limited rewrite of prove/6
, we can
already observe an overall speedup of up to 30% on a 16-core
machine compared to serial execution.
That's not bad at all compared to the results reported
by other approaches, and in particular given the
straightforwardness of our maplist/N
refactoring.
Bibliography
- swi-npt
- Wielemaker, J. (2003) Native Preemptive Threads in SWI-Prolog. ICLP (2003)
- pilp-strategies-survey
- Fonseca, N., Silva, O., Camacho, R. (2005). Strategies to Parallelize ILP Systems. Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 2005), volume 3625 of LNAI
- adpvoa
- Konstantopoulos, S.T. (2003). A Data-Parallel Version of Aleph. In Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning, co-located with ECML/PKDD'2003
- add-pilp
- Graham, J., Page, C.D., Kamal, A. (2003). Accelerating the Drug Design Process through Parallel Inductive Logic Programming Data Mining. CSB 03: Proceedings of the IEEE Computer Society Conference on Bioinformatics
- dtp-ilp-mapreduce
- Srinivasan, A., Faruquie, T.A., Joshi, S. (2012). Data and task parallelism in ILP using MapReduce Machine Language, Volume 86, Issue 1 pp 141-168
- swi-threading-1
- https://stackoverflow.com/questions/39421212/multi-threading-in-swi-prolog-gives-a-little-improvement
- swi-threading-2
- https://discourse.swi-prolog.org/t/regression-with-concurrent-maplist-3-in-8-3-27/4218
- prologue-maplist
- http://www.complang.tuwien.ac.at/ulrich/iso-prolog/prologue#maplist