Using Large Language Models to generate Prolog planners

Combining the strength of Large Language Models (LLMs) with Prolog combinatorial search is a natural fit actually delivering tangible progresses in mainstream automated planning.

State of the art in LLM-generated planners

Parts of the public discussion about LLMs revolves around unachievably high goals such as "General AI", but LLMs are just statistical language models after all and excel in language translation and summarization tasks foremost. And while LLMs can be trained to do almost anything including optimization and other math and reasoning tasks, it's exceedingly clear results are determined by the used training data sets, and remain highly volatile and unpredictable for problems outside that set, a fact well known acknowledged in papers such as aiw and posp.

Rather than having LLMs perform planning itself, it's thus only natural to complement LLMs with the raw combinatorial power of Prolog, much like a human would be reaching out to a pocket calculator. Or, as tnltopgwl formulates it

Our empirical results [...] show that LLMs are much better suited towards translation rather than planning.

Putting LLMs to use for translating natural language into Prolog is also a straightforward extension of its original goals considering Prolog was originally devised for natural language processing. Planning being another field of application right from the start for Prolog, it isn't surprising many researchers are experimenting with Prolog, Contraint Logic Programming (CLP), and other logic-based programming (LP) languages as a target for code generation using LLMs (see the bibliography for a small selection of reports).

Practicitioners are also turning to solutions targetting Prolog rrbnl after having observed (swipldiscourse1) that targetting domain-specific planning languages and formalisms lacking the inherit logical underpinnings that Prolog has, such as the one discussed in tnltogw, after

initial promising results to hit a road block.

And while chain-of-thought (CoT) approaches have reported vast improvements in reasoning LLMs, those results remain highly specific to the training data and thus require training sets that neither exist nor can easily be generated (as briefly discussed below), which is acknowledged in cotplanning:

[Our results] spotlights drawbacks of chain of thought, especially the sharp tradeoff between possible performance gains and the amount of human laber necessary to generate examples with correct reasoning traces.

However, essential CoT approaches for guiding LLMs using templated solutions, controlled language, and stratified, stepwise solving of sub-goals are disciplines that come naturally and directly by instructing LLMs to generate domain-specific Prolog source, further suggesting Prolog code to be an excellent fit as an LLM target.

Extending container planning into generic planning

We're briefly mentioning WARPLAN here, one of the earliest Prolog-based planning systems created by David Warren, who is known in the Prolog world for introducing the Warren Abstract Machine as the basis for performant 1980s Prolog implementations, with many Prolog engines still using it today. WARPLAN was a straightforward implementation of STRIPS planning in Prolog, only adding a convention for action post-conditions, which came in the form of add(pred(..)) and del(pred(...)) literals.

In our planner for the container problem developed in Planning container shipping using Prolog, we're making use of Prolog asserta/1, assertz/1, and retract/1 builtins to encode state changes produced by actions instead. The benefits of doing so (apart from not being restricted to mere prepositional states) is now evident when putting LLMs to use, since generating Prolog action predicates containing vanilla asserta/1, assertz/1, and retract/1 builtins is directly supported in existing LLMs and training data, whereas WARPLAN-like abstractions and formalism would need extra training to make it practical.

As explained in Advanced combinatorial planning, the destructive nature of asserta/1 and co. is usually insufficient to capture even slightly elaborate planning models, but that's not a problem since we can automatically rewrite action predicates into a form using the vardb-feature, as also already explained in above reference. That's right — we're expecting the LLM to translate our input problem description into basically naive Prolog code and will change that code the LLM is generating into a refined form.

We're also instructing the LLM to not attempt to actually generate a solution predicate, which might need further explanation: what we want from the LLM is Prolog code for

The actual combinatorial search trying and backtracking actions is not part of what we expect the LLM to produce for us; we're supplying minimal generic planning library code to pick up and backtrack over LLM-generated predicates in standard Prolog fashion instead as follows:

% to enumerate state predicates:
% predicates without clause body (and without vars)
state_pred(P/N) :-
    current_predicate(P/N),
    \+ (P = 'action_pred'; P = 'check_pred'; P = 'state_pred'; P = 'main'),
    functor(T, P, N),
    clause(T, true).

% to list a check predidcate:
% the single parameter-less predicate the body of
% which is a conjunction entirely over state primitive predicates
check_pred(P/0) :-
    current_predicate(P/0),
    \+ (P = 'action_pred'; P = 'check_pred'; P = 'state_pred'; P = 'main'),
    clause(P, Body), \+ Body = true.

% to enumerate action predicates:
% any predicate that is neither a state nor a check predicate,
% nor one of the planning lib predicates defined here
action_pred(P/N) :-
    current_predicate(P/N),
    \+ (P = 'action_pred'; P = 'check_pred'; P = 'state_pred'; P = 'main'),
    \+ state_pred(P/N),
    \+ check_pred(P/N).

% main: generic routine for checking if the desired state has been reached
main :-
    check_pred(P/0),
    call(P),
% ... code for informing about having completed planning omitted ...
!.

% main: generic planning routine executed recursively as long as no
% desired state is reached
main :-
    action_pred(P/N),
    functor(T, P, N),
    call(T),
% ... code for informing about pursuing a planning action omitted ...
    main.

Prolog planner generation prompting guide

With our modest Prolog planning infrastructure in place, we can now rather directly prompt an instruction-tuned LLM to generate planners for us given a description of the text problem with a system prompt such as follows (incorporating a text description of the AILPs logistics problem as in-context example):

As an expert in Prolog programming, you're supposed to generate
Prolog source code in the WARPLAN style gleaned from a given planning
problem description text. A WARPLAN style Prolog planning program consists
of Prolog facts, restricted to Datalog syntax, representing the current
state of the planning domain at any time, and clauses representing
possible actions with parameters that query and check the current state
and change state facts using the retract and assertz Prolog builtins, as
well as a predicate for checking whether a desired state is reached. A
generic main predicate attempting to invoke any finite number of action
predicates until the desired state is reached can be omitted.

Include portions of the problem text description as Prolog source comment
along with the state fact and action predicate clauses representing the
respective piece of text in generated source. Only produce ISO Prolog
without consulting or otherwise importing Prolog library code. Do not
produce helper clauses. Do not use if-then or if-then-else operators in
generated Prolog code.  Do no generate output other than Prolog source
code with comments.

What follows is an example problem text with a Prolog solution.
Problem description text:
Packages must be transported from and to locations inside the six cities
named city1, city2, city3, city4, city5, and city6 using trucks, and
between those cities using airplanes. Each city has two locations:
city1 has locations city11 and city12, city2 has locations city21 and
city22, city3 has locations city31 and city32, city4 has locations city41
and city42, city5 has locations city51 and city52, and city6 has locations
city61 and city62. Initially, package1 is located at city21, package2 is
located at city12, package3 is located at city11, package4 is located at
city11, package5 is located at city42, and package6 is located at city31.
The aim is to move package1 to city21, package2 to city62, package3 to
city61, package4 to city32, package5 to city62, and package6 to city12.
In each city, at the locations city11, city21, city31, city41, city51,
and city61, there's a truck to transport packages to other locations
within the city. There are moreover two airplanes, named plane1
and plane2, located at the airport of city1 at location11
and at the airport of city2 at location22. The airports
of the other cities have the locations location32 in city3,
location42 in city4, location52 in city5, and location62 in city6.

To move packages, a sequence of actions as follows must be
performed until the packages have reached their destination:
The action load_truck will place a package in a truck at a given location.
The action unload_truck will unload a package from a truck at a given
location.
The action fly_airplane will fly an airplane from one airport to another.
The action unload_airplane will unload a package from an airplane at a
 given location.
The action load_airplane will load an airplane with a package at a given
location.

Prolog program to solve the problem:
... (code following here) ...

The Prolog state, action, and check predicates to solve the example problem are not listed here. As explained, the actual main solving routine is not part of the prompt and the generic one above will be appended to the LLM-generated result.

Using the above template as system prompt, what needs to follow in a full prompt is the description of the instance problem as "user" role input, typically starting with a sentence like this

Write a Prolog program to find a sequence of actions to ...

Notes

Note in the code actually in use on this site we're adding some output messages to inform the user if/when a plan has been found, and the sequence of actions the plan consists of.

Disregarding additional code for outputting found solutions, now searching for a plan is as easy as invoking the main goal

Test in browser
:- intialization( main )

More toy planning problems and training/test data

Results were obtained by attempting to generate planners for our running container planning example problem, and also for the following problems as additional validation or in-context examples.

Container ship packing

complementary ad-hoc container planning problem involving onboard stacking order much simpler than the container planning problem we're using as the running example here

Airplanes/trucks

problem derived from the "logistics" AI Planning Competition (AIPS) problem partially explained in osllpds

Dock/worker/robots

artificial problem involving planning moves of robotic van carriers and gantry cranes at container terminals attributed to dwr (but note the problem has already been part of AI Planning Competition suites much earlier)

Airplane maintenance

another AIPS problem for which we created as text description to complement our problem suite with a scheduling (rather than planning) problem; note to recognize and reject or dispatch problems of this nature to scheduling template accordingly requires model customization such as QLoRA specialization for text classification

Realistical logistics optimisation problems

While generating solutions for toy examples is certainly impressive, realistic problems involve much more elaborate business, marine, geographic, fuel pricing/consumption, etc. rules, as well as defining cost models and assuming a role to optimize in the logistic process such as carriers vs. shipping company vs. container terminal operator.

As can be seen by our selection of toy planning problems already, while many classic planning problems exist as PDDL programs for AI planning competitions, textual descriptions for these are already lacking or non-existant. Even the original robotic planning problem for STRIPS was presented as floor plan (a graphic rather than textual description) for moving Shakey the robot around with additional blocks-world like actions involving picking, pushing, and climbing.

It's tempting then to instruct very large LLMs (405B or newer 70B models) incorporating much larger training data text sets to generate problem descriptions, as text, in line with the predominant use case for these large LLMs being training data generation for specializing smaller LLMs. But our results for instructing eg. llama 405B models to generate self-contained descriptions for realistic standard logistics planning problems such as the berth allocation problem, while not entirely discouraging, shows that substantial manual inspection and editing would be required; so much so, in fact, that's easier to code the solution in Prolog directly.

The fact that realistic planning problems aren't commonly formulated as text suggests that fully self-contained textual descriptions from scratch may be more useful for ad-hoc problems but not necessarily for logistics at this stage. Logistics optimization is an established field of planning after all, and progress may evolve in incremental steps using experimentation and regular software engineering practices, which Prolog as a standardized programming language is ideally suited for of course.

Extending planners to solutions

Whether generated, hand-crafted, or -polished, extending Prolog planners into practical solutions involves additional off-the-shelf components not discussed here:

Link/bibliograpy

posp
Kambhampati, S., Valmeekam, K., Guan, L., Verma, M., Stechly, K., Bhambri, S., Sayldyt, L., Murthy, A., Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235 2024
aiw
Nezhurina, M., Cipolina-Kun, L., Cherti, M., Jitsev, J. Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models NeurIPS Scientific Methods for Understanding Deep Learning Workshop (SciDL) 2024
tnltopgwl
Xie, Y., Yu, C., Zhu, T., Bau, J., Gong, Z., Soh, H. Translating Natural Language to Planning Goals using LLMs
dwr
Nau, D., Ghallab, M., Traverso, P. (2004) Automated Planning: Theory & Practice. San Francisco, CA, USA Morgan Kaufmann Publishers Inc
osllpds
Paul, G., Röger, G., Keller, T., Helmert, M., Optimal Solutions to Large Logistics Planning Domain Problems Vol. 8 No. 1 (2017): Tenth Annual Symposium on Combinatorial Search
swipldiscourse1
LLM and Prolog, a Marriage made in Heaven? https://swi-prolog.discourse.group/t/llm-and-prolog-a-marriage-in-heaven/7666/2
rrbnl
Borazjaniz, N., Pinatadosi, S.T. Reliable Reasoning Beyond Natural Language https://arxiv.org/abs/2407.11373
warplan
Warren D. (1974) A System for Generating Plans Dept. of Computational Logic, University of Edinburgh School of Artificial Intelligence
cotplanning
Stechly, K., Valmeekam, K., Kambhampati S.] Chain of Thoughtlessness? An Analysis of CoT in Planning Proceedings of the 38tth Conference on Neural Information Processing Systems (NeurIPS 2024)