Parallelizing Aleph on Quantum Prolog

Having demonstrated Quantum Prolog running an ISO Prolog port of Aleph and solving a fragment of the Aminoacid/Hexose binding problem, we can now turn to the full problem and look at optimizations. We review well-known approaches for parallelizing ILP, then explore a straightforward, not previously published Map/Reduce-style parallelization for the theory checking phase of Aleph's theory construction, and report results.

Review of existing Aleph parallelization approaches

From a cursory look at the console output written by Aleph, we can conclude that the vast majority of execution time is spent during the verification phase, where hypothesized clauses to predict hexose bindings sites generated by Aleph are quantitatively assessed with respect to whether they're verified or falsified by the given examples, respectively.

For the problem of speeding up verification using parallel execution, we can draw from a relative wealth of existing approaches, which we're briefly reviewing here:

Among those, swi-npt, by the author of SWI Prolog, and used as a showcase for introducing SWI Prolog's threading implementation, stands out. The approach taken, though, isn't directly usable for us here, since verification is weakened to a perform a randomized search split accross multiple threads rather than complete coverage testing. Note the required source adaptions versus Aleph version 4 weren't published, and it's not entirely clear those would be able to run succesfully on up-to-versions of SWI Prolog (swi-threading-1).

adpvoa presents a relatively simple parallelization of the coverage test phase of induce/0 based on Aleph version 3 in a shared-nothing cluster of Prolog engines, with however disappointing results.

add-pilp is said to be very similar to adpvoa in pilp-strategies-survey although it doesn't use Aleph.

dtp-ilp-mapreduce, by the author of Aleph, can be best compared to the approach we're describing next, although it involves employing multiple physical machines running isolated Prolog execution engines under an external Map/Reduce framework aimed at big data workloads.

A concurrent maplist implementation of prove/6

Not mentioned in any of the above approaches, Aleph's prove/6 predicate appears to offer a very straightforward opportunity for parallelization, since all it does is to pick an interval of example indices for calling index_prove/6 on, and calling itself recursively for the remainder list of intervals. For collecting results, additional housekeeping such as appending lists and adding counts is performed, but no data is flowing into subsequent invocations from preceding ones.

Moreover, Aleph's stock prove/6 implementation is already augmented with a tail-recursive prove/6 variant, hinting at previous attempts to optimize example verification, albeit in a serial execution environment.

Hence, prove/6 invites rewriting into a Map/Reduce style algorithm iterating over a list of interval lists using eg. maplist/N, and then executing prove/6 using a parallelized variant of maplist/N.

maplist/N has been part of Prolog implementations as old as DEC-10 Prolog, and has been a candidate for inclusion into ISO Prolog for some time (prologue-maplist). Regular maplist/N takes a callable term given in its first argument, assembles effective callables by appending additional arguments from elements of lists given in its second and subsequent arguments, and invokes the so-obtained callables invidually. For example,

maplist(f(1), [ 10, 20, 30], [ X, Y, Z])

is equivalent to invoking

call(f(1, 10, X)),
call(f(1, 20, Y)),
call(f(1, 30, Z)).

A regular (non-concurrent) implementation of maplist/N exposes a number of corner cases due to the fact that individual invocations are executed in sequence, potentially transporting bound variables from preceding invocations into subsequent ones. Concurrent implementations of maplist/N, on the other hand, execute individual callables in no particular order and limit the variable dependencies that can occur across individual invocations.

We now take a look at the stock Aleph prove/6 clauses:

% regular prove/6
prove(_,_,_,[],[],0).
prove(Flags,Type,Clause,[Interval|Intervals],IList,Count):-
	index_prove(Flags,Type,Clause,Interval,I1,C1),
	prove(Flags,Type,Clause,Intervals,I2,C2),
	aleph_append(I2,I1,IList),
	Count is C1 + C2.

and note we can replace those trivially by

% prove/6 using maplist/N
prove(Flags,Type,Clause,Intervals,IList,Count):-
	maplist(index_prove(Flags,Type,Clause),Intervals,Is,Cs),
	prove6_flatten(Is, IList),
	prove6_sumlist(Cs, Count).

We additionally supply the following auxiliary predicates:

% auxiliary predicate to sum a list of numbers
prove6_sumlist([], 0).
prove6_sumlist([H|T], N) :- prove6_sumlist(T, M), N is H + M.

% from ECLiPSe's flatten/2 to flatten ResultIntervals list
% (actually, only a single-level flatten/2 predicate is needed)
prove6_flatten(List, Flat) :-
	prove6_flatten(List, Flat, []).
prove6_flatten([], Res, Res) :- !.
prove6_flatten([Head|Tail], Res, Cont) :-
	!,
	prove6_flatten(Head, Res, Cont1),
	prove6_flatten(Tail, Cont1, Cont).
prove6_flatten(Term, [Term|Cont], Cont).

And we also make sure that the potential call to retract/1 and asserta/1 descending from calls to index_prove/4, as called from prove/6 and used to support caching in Aleph, is always switched off, since we of course can't expect to obtain useful results when executing code mutating the global database in parallel, so we just put the following code lines of the index_prove1/9 into comments:

	%(Caching = true ->
	%        (retract('$aleph_local'(example_cache,L)) ->
	%                asserta('$aleph_local'(example_cache,[Num|L]));
	%                asserta('$aleph_local'(example_cache,[Num])));
	%        true),

Performance report

Note the potential for actual optimization is constrained because the interval lists generated by Aleph are far from being distributed equally. In fact, if we were to stop Aleph after the first iteration of induce/0, like we did in part 1 for getting to run Aleph on ISO Prolog at all, we'd just observe Aleph submitting the single interval 1 - 80 to prove/6, representing the request to verify examples with index 1 through 80, and since only a single interval is submitted, not giving rise to parallelizing anything. On the other hand, if we look at further iterations, we notice that the interval list argument supplied to prove/6 can contain up to 20 interval lists. Aleph collapses interval lists according to whether the interval represents a contiguous lists of example indices satisfying the hypothesis, but has no way to further split those into subinterval lists so as to make intervals equal in size. Balancing interval lists to obtain parallel workloads roughly taking equal amounts of time to peform thus is another easy optimization with almost-guaranteed benefits that could be employed here.

Still, even with our limited rewrite of prove/6, we can already observe an overall speedup of up to 30% on a 16-core machine compared to serial execution.

That's not bad at all compared to the results reported by other approaches, and in particular given the straightforwardness of our maplist/N refactoring.

Bibliography

swi-npt: Wielemaker, J. (2003) Native Preemptive Threads in SWI-Prolog. ICLP (2003)
pilp-strategies-survey: Fonseca, N., Silva, O., Camacho, R. (2005). Strategies to Parallelize ILP Systems. Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 2005), volume 3625 of LNAI
adpvoa: Konstantopoulos, S.T. (2003). A Data-Parallel Version of Aleph. In Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning, co-located with ECML/PKDD'2003
add-pilp: Graham, J., Page, C.D., Kamal, A. (2003). Accelerating the Drug Design Process through Parallel Inductive Logic Programming Data Mining. CSB 03: Proceedings of the IEEE Computer Society Conference on Bioinformatics
dtp-ilp-mapreduce: Srinivasan, A., Faruquie, T.A., Joshi, S. (2012). Data and task parallelism in ILP using MapReduce Machine Language, Volume 86, Issue 1 pp 141-168
swi-threading-1: https://stackoverflow.com/questions/39421212/multi-threading-in-swi-prolog-gives-a-little-improvement
swi-threading-2: https://discourse.swi-prolog.org/t/regression-with-concurrent-maplist-3-in-8-3-27/4218
prologue-maplist: http://www.complang.tuwien.ac.at/ulrich/iso-prolog/prologue#maplist