Using Large Language Models to generate Prolog planners

Combining the strength of Large Language Models (LLMs) with Prolog combinatorial search is a natural fit actually delivering tangible progresses in mainstream automated planning.

State of the art in LLM-generated planners

Parts of the public discussion about LLMs revolves around unachievably high goals such as "General AI", but LLMs are just statistical language models after all and excel in language translation and summarization tasks foremost. And while LLMs can be trained to do almost anything including optimization and other math and reasoning tasks, it's exceedingly clear results are determined by the used training data sets, and remain highly volatile and unpredictable for problems outside that set, a fact well known and acknowledged in papers such as aiw and posp.

Rather than having LLMs perform planning itself, it's thus only natural to complement LLMs with the raw combinatorial power of Prolog, much like a human would be reaching out to a pocket calculator. Or, as tnltopgwl formulates it

Our empirical results [...] show that LLMs are much better suited towards translation rather than planning.

Putting LLMs to use for translating natural language into Prolog is also a straightforward extension of its original goals considering Prolog was devised for natural language processing. Planning being another field of application right from the start for Prolog, it isn't surprising many researchers are experimenting with Prolog, Contraint Logic Programming (CLP), and other logic-based programming (LP) languages as a target for code generation using LLMs (see the bibliography for a small selection of reports).

Practicitioners are also turning to solutions targetting Prolog (eg. rrbnl) after having observed (swipldiscourse1) that targetting domain-specific planning languages and formalisms lacking the inherit logical underpinnings that Prolog has, such as the one discussed in tnltogw, after

initial promising results to hit a road block.

And while chain-of-thought (CoT) approaches have reported vast improvements in reasoning LLMs, those results remain highly specific to the training data and thus require training sets that neither exist nor can easily be generated (as briefly discussed below), which is acknowledged in cotplanning:

[Our results] spotlights drawbacks of chain of thought, especially the sharp tradeoff between possible performance gains and the amount of human labor necessary to generate examples with correct reasoning traces.

However, essential CoT approaches for guiding LLMs using templated solutions, controlled language, and stratified, stepwise solving of sub-goals are disciplines that come naturally and directly by instructing LLMs to generate domain-specific Prolog source, further suggesting Prolog code to be an excellent fit as a target for code generation by LLMs.

Extending container planning into generic planning

We're briefly mentioning WARPLAN here, one of the earliest Prolog-based planning systems created by David Warren, who is known in the Prolog world for introducing the Warren Abstract Machine as the basis for performant 1980s Prolog implementations, with many Prolog engines still using it today. WARPLAN was a straightforward implementation of STRIPS planning in Prolog, only adding a convention for action post-conditions, which came in the form of add(pred(..)) and del(pred(...)) literals.

In our planner for the container problem developed in Planning container shipping using Prolog, we're making use of Prolog asserta/1, assertz/1, and retract/1 builtins to encode state changes produced by actions instead. The benefits of doing so, apart from not being restricted to mere prepositional states, is now evident when putting LLMs to use, since generating Prolog action predicates containing vanilla asserta/1, assertz/1, and retract/1 builtins is directly supported in existing LLMs and training data, whereas WARPLAN-like abstractions and formalism would need extra training to make it practical.

As explained in Advanced combinatorial planning, the destructive nature of asserta/1 and co. is usually insufficient to capture even slightly elaborate planning models, but that's not a problem since we can automatically rewrite action predicates into a form using the vardb-feature, as also already explained in above reference. That's right — we're expecting the LLM to translate our input problem description into basically naive Prolog code and will change that code the LLM is generating into a refined form.

We're also instructing the LLM to not attempt to actually generate a solution predicate, which might need further explanation: what we want from the LLM is Prolog code for

state clauses: dynamic facts encoding the current state of our planning domain, starting with the initial state
action clauses: state-change predicates that represent actions available to the planner in our universe as given in our textual description, and that alter states using assertz/1 and retract/1 on state clauses as arguments
a clause for checking state facts whether a desired state is achieved used as termination for our plan search.

The actual combinatorial search, trying and backtracking actions, is not part of what we expect the LLM to produce for us; we're supplying minimal generic planning library code to pick up and backtrack over LLM-generated predicates in standard Prolog fashion instead as follows:

% to enumerate state predicates:
% predicates without clause body (and without vars)
state_pred(P/N) :-
  current_predicate(P/N),
  \+ (P = 'action_pred'; P = 'check_pred'; P = 'state_pred'; P = 'main'),
  functor(T, P, N),
  clause(T, true).

% to list a check predidcate:
% the single parameter-less predicate the body of
% which is a conjunction entirely over state predicates
check_pred(P/0) :-
  current_predicate(P/0),
  \+ (P = 'action_pred'; P = 'check_pred'; P = 'state_pred'; P = 'main'),
  clause(P, Body), \+ Body = true.

% to enumerate action predicates:
% any predicate that is neither a state nor a check predicate,
% nor one of the planning lib predicates defined here
action_pred(P/N) :-
  current_predicate(P/N),
  \+ (P = 'action_pred'; P = 'check_pred'; P = 'state_pred'; P = 'main'),
  \+ state_pred(P/N),
  \+ check_pred(P/N).

% main: generic routine for checking if the desired state has been reached
main :-
  check_pred(P/0),
  call(P),
  % ... code for informing about having completed planning omitted ...
  !.

% main: generic planning routine executed recursively as long as no
% desired state is reached
main :-
  action_pred(P/N),
  functor(T, P, N),
  call(T),
  % ... code for informing about pursuing a planning action omitted ...
  main.

Prolog planner generation prompting guide

With our modest Prolog planning infrastructure in place, we can now rather directly prompt an instruction-tuned LLM to generate planners for us given a description of the text problem, with a system prompt such as follows (incorporating a text description of the AIPS logistics problem as in-context example):

As an expert in Prolog programming, you're supposed to generate
Prolog source code in the WARPLAN style gleaned from a given planning
problem description text. A WARPLAN style Prolog planning program consists
of Prolog facts, restricted to Datalog syntax, representing the current
state of the planning domain at any time, and clauses representing
possible actions with parameters that query and check the current state
and change state facts using the retract and assertz Prolog builtins, as
well as a predicate for checking whether a desired state is reached. A
generic main predicate attempting to invoke any finite number of action
predicates until the desired state is reached can be omitted.

Include portions of the problem text description as Prolog source comment
along with the state fact and action predicate clauses representing the
respective piece of text in generated source. Only produce ISO Prolog
without consulting or otherwise importing Prolog library code. Do not
produce helper clauses. Do not use if-then or if-then-else operators in
generated Prolog code.  Do no generate output other than Prolog source
code with comments.

What follows is an example problem text with a Prolog solution.
Problem description text:
Packages must be transported from and to locations inside the six cities
named city1, city2, city3, city4, city5, and city6 using trucks, and
between those cities using airplanes. Each city has two locations:
city1 has locations city11 and city12, city2 has locations city21 and
city22, city3 has locations city31 and city32, city4 has locations city41
and city42, city5 has locations city51 and city52, and city6 has locations
city61 and city62. Initially, package1 is located at city21, package2 is
located at city12, package3 is located at city11, package4 is located at
city11, package5 is located at city42, and package6 is located at city31.
The aim is to move package1 to city21, package2 to city62, package3 to
city61, package4 to city32, package5 to city62, and package6 to city12.
In each city, at the locations city11, city21, city31, city41, city51,
and city61, there's a truck to transport packages to other locations
within the city. There are moreover two airplanes, named plane1
and plane2, located at the airport of city1 at location11
and at the airport of city2 at location22. The airports
of the other cities have the locations location32 in city3,
location42 in city4, location52 in city5, and location62 in city6.

To move packages, a sequence of actions as follows must be
performed until the packages have reached their destination:
The action load_truck will place a package in a truck at a given location.
The action unload_truck will unload a package from a truck at a given
location.
The action fly_airplane will fly an airplane from one airport to another.
The action unload_airplane will unload a package from an airplane at a
 given location.
The action load_airplane will load an airplane with a package at a given
location.

Prolog program to solve the problem:
... (code following here) ...

The Prolog state, action, and check predicates to solve the example problem are not listed here. As explained, the actual main solving routine is never part of the prompt and the generic one above will be appended to the LLM-generated result.

Using the above template as system prompt, what needs to follow in a full prompt is the description of the instance problem as "user" role input, typically starting with a sentence like this

Write a Prolog program to find a sequence of actions to ...

Notes

this prompt has been found to work well for instruction-tuned LLMs for coding whose training sets include Prolog, such as Code Llama and Qwen Coder; useful results could be obtained starting with 7B/8B parameter models of 3rd generation local LLMs such as Llama 3.1 or newer and equivalent Qwen Coder (and Mistral, Deep Seek) models
as you can see from the example problem, the description already uses artificial names for cities and other real-world objects; when using real city names or real names for other objects in the planning domain, we could observe overfitting and details of the instance rules from the example given in the system prompt to spill into the argument problem
a context size of at least 12k tokens is recommended (depending on the instance problem text size)
even though we ask for solving a concrete problem instance here, we can assume the generated Prolog code to work on other instances with changed parameters such as other cities and freights due to how these facts are forced to be encoded in straightforward Prolog; if we were trying to specify an awkward abstract problem description (for the cases were this is even possible), we found the results to be much less usable
the prompt is engineered carefully and specifically to avoid or reduce the following defects in generated Prolog source code
- LLMs tend to encode or anticipate bogus hard-coded solutions in global solve-like predicates which is taken care of by requesting only translation of action/state/check predicates and omit a main predicate as already explained
- usage of constraint logic programming or other libraries, or usage of non-ISO Prolog "folklore" code such as append/3 and substract/3 is avoided by explicitly instructing the LLM to not make use of those
- missing dynamic directives for state predicates are solved by supplying example code accordingly
- initial state facts being dynamically assertzed rather than defined via regular clauses (which will be rejected by Prolog with permission_error as an attempt to redefine assertz/1) is also avoided by supplying example code.

Note in the code actually in use on this site we're adding some output messages to inform the user if/when a plan has been found, and the sequence of actions the plan consists of.

Disregarding additional code for outputting found solutions, now searching for a plan is as easy as invoking the main goal

Test in browser

:- intialization( main )

Realistical logistics optimisation problems

While generating solutions for toy examples is certainly impressive, realistic problems involve much more elaborate business, marine, geographic, fuel pricing/consumption, etc. rules, as well as defining cost models and assuming a role to optimize in the logistic process such as carriers vs. shipping company vs. container terminal operator.

As can be seen by our selection of toy planning problems already, while many classic planning problems exist as PDDL programs for AI planning competitions, textual descriptions for these are already lacking or non-existant. Even the original robotic planning problem for STRIPS was presented as floor plan (a graphic rather than textual description) for moving Shakey the robot around with additional blocks-world like actions involving picking, pushing, and climbing.

It's tempting then to instruct very large LLMs (405B or newer 70B models) incorporating much larger training data text sets to generate problem descriptions, as text, in line with the predominant use case for these large LLMs being training data generation for specializing smaller LLMs. But our results for instructing eg. llama 405B models to generate self-contained descriptions for realistic standard logistics planning problems such as the berth allocation problem, while not entirely discouraging, shows that substantial manual inspection and editing would be required; so much so, in fact, that's easier to code the solution in Prolog directly.

The fact that realistic planning problems aren't commonly formulated as text suggests that fully self-contained textual descriptions from scratch may be more useful for ad-hoc problems but not necessarily for logistics at this stage. Logistics optimization is an established field of planning after all, and progress may evolve in incremental steps using experimentation and regular software engineering practices, which Prolog as a standardized programming language is ideally suited for of course.

Extending planners to solutions

Whether generated, hand-crafted, or -polished, extending Prolog planners into practical solutions involves additional off-the-shelf components not discussed here:

integrating services/messaging and user interfaces (for logistics specifically, this involves common EDIFACT messaging for exchanging load lists and other supply chain data messages created and widely used by logistics software already)
adding business rules such as a domain model for container terminals/yards covering van carriers and other terminal equipment
encoding common and domain knowledge and integrating RAG pipelines into Prolog, such as business calendars for scheduling predicates
iterative refinement of plans via in-place completion and tool calling
running Prolog and integration software on cloud (FaaS) platforms adapted to large-scale combinatorial search

Link/bibliograpy

posp: Kambhampati, S., Valmeekam, K., Guan, L., Verma, M., Stechly, K., Bhambri, S., Sayldyt, L., Murthy, A., Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235 2024
aiw: Nezhurina, M., Cipolina-Kun, L., Cherti, M., Jitsev, J. Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models NeurIPS Scientific Methods for Understanding Deep Learning Workshop (SciDL) 2024
tnltopgwl: Xie, Y., Yu, C., Zhu, T., Bau, J., Gong, Z., Soh, H. Translating Natural Language to Planning Goals using LLMs
dwr: Nau, D., Ghallab, M., Traverso, P. (2004) Automated Planning: Theory & Practice. San Francisco, CA, USA Morgan Kaufmann Publishers Inc
osllpds: Paul, G., Röger, G., Keller, T., Helmert, M., Optimal Solutions to Large Logistics Planning Domain Problems Vol. 8 No. 1 (2017): Tenth Annual Symposium on Combinatorial Search
swipldiscourse1: LLM and Prolog, a Marriage made in Heaven? https://swi-prolog.discourse.group/t/llm-and-prolog-a-marriage-in-heaven/7666/2
rrbnl: Borazjaniz, N., Pinatadosi, S.T. Reliable Reasoning Beyond Natural Language https://arxiv.org/abs/2407.11373
warplan: Warren D. (1974) A System for Generating Plans Dept. of Computational Logic, University of Edinburgh School of Artificial Intelligence
cotplanning: Stechly, K., Valmeekam, K., Kambhampati S.] Chain of Thoughtlessness? An Analysis of CoT in Planning Proceedings of the 38tth Conference on Neural Information Processing Systems (NeurIPS 2024)

Quantum Prolog