However, I am unsure of how many times Prolog backtracks.
Not a really good question. Backtracks from where to where?
Consider that Prolog builds a search tree (or "proof tree") such that:
- An OR node is created for each predicate call, where a child is one of the possible clauses of the predicate.
- An AND node is created for each clause, where a child is one of the required predicate calls of the clause.
A valid proof is found if the proof has this form:
- Recursively down the tree from the root "goal":
- Each AND node has all children marked "proven"/"true"
- Each OR node has exactly one of its children marked "proven"/"true"
- The leaf nodes are axioms and "proven"/"true" by defintion: they are the facts.
"Backtracking" occurs when Prolog finds a situation where the the proof stalls:
- None of the children of an OR node can be proven. Then "backtrack" up one level of the tree, probably to an AND node (where more backtracking will occur).
- One of the children of an AND node cannot be proven. Then "backtrack" towards the previous sibling of the child that cannot be proven, asking for a "redo". If there is no previous sibling, backtrack up one level of the tree, probably to an OR node (which will have to try another of its children).
A complementary view of the execution of a Prolog program as a search tree is the "Byrd Box Model". If have collected a few notes. They are less than perfect but here they are: About the "Byrd Box Model". Take a look.
Anyway, back to that program.
It is informative to run it through the debugger, but as replacement solution, we can add some printing statements:
p(X, Y) :-
format("p(~q,~q) clause #1\n",[X,Y]),
q(X, Y),
format("In p/2, success with q(~q,~q)\n",[X,Y]).
p(X, Y) :-
format("p(~q,~q) clause #2\n",[X,Y]),
r(X, Y),
format("In p/2, success with r(~q,~q)\n",[X,Y]).
q(X, Y) :-
format("q(~q,~q)\n",[X,Y]),
s(X),
format("In q/2, success with s(~q)\n",[X]),
t(Y),
format("In q/2, success with t(~q)\n",[Y]).
r(X, Y) :-
format("r(~q,~q)\n",[X,Y]),
u(X, Y),
format("In r/2, success with r(~q,~q)\n",[X,Y]).
s(f(a)) :- format("Fact s(f(a))\n").
t(g(b)) :- format("Fact t(g(b))\n").
u(a, g(b)) :- format("Fact u(a,g(b))\n").
Running this brings:
?- p(a,g(Y)).
p(a,g(_1054)) clause #1
q(a,g(_1054))
p(a,g(_1054)) clause #2
r(a,g(_1054))
Fact u(a,g(b))
In r/2, success with r(a,g(b))
In p/2, success with r(a,g(b))
Y = b.
We can deduce the search:
- Start at p/2
- Go to q/2
- Here we fail, there is no s(X), so we backtrack to p/2 and try another clause (the next child of the OR node)
- At p/2 again
- Go to r/2
- We are hitting the bottom of the tree: a fact!
- And all that remains is to execute children of AND nodes that perform printout.
Don't know whether that makes things clearer?
One last Q, the facts having nested facts, IE
s(f(a))
where f is undefined, can I assume this is syntactically equal to s(a)?
Absolutely not. s(f(a)) is a syntactic structure. Nothing in it needs to be "defined" and nothing of it will be evaluted. It is different from s(a), another syntactic structure. This can be seen on the Prolog toplevel:
?- s(f(a)) == s(a).
false.