I can summarize the conclusions I was led to by my personal experience, with a disclaimer that what follows may not be the entirely right explanation. The anwer seems to lie in the differences between Mathematica call stack and traditional call stacks, which originates from Mathematica pattern-defined functions being really rules. So, there are no real function calls. Mathematica needs a stack for a different reason: since normal evaluation happens from the bottom of an expression tree, it must keep intermediate expressions in case when deeper and deeper parts of (sub)expressions get replaced as a result of rule application (some parts of an expression grow from the bottom). This is the case, in particular, for rules defining what we'd call non tail-recursive functions in other languages. So, once again, the stack in Mathematica is a stack of intermediate expressions, not function calls.
This means that if, as a result of rule application, an (sub)expression can be rewritten in its entirety, the expression branch need not be kept on the expression stack. This is probably what is referred as tail call optimization in Mathematica - and this is why in such cases we have iteration rather than recursion (this is one very good example of the differences between rule applications and function calls). Rules like f[x_]:=f[x+1]
are of this type. If, however, some sub-expression get rewritten, producing more expression structure, then expression must be stored on the stack. The rule f[x_ /; x < 5] := (n += 1; f[x + 1])
is of this type, which is a bit hidden until we recall that ()
stand for CompoundExpression[]
. Schematically what happens here is f[1] -> CompoundExpression[n+=1, f[2]] -> CompoundExpression[n+=1,CompoundExpression[n+=1,f[3]]]->etc
. Even though the call to f is the last every time, it happens before the full CompoundExpression[]
executes, so this still must be kept on the expression stack. One could perhaps argue that this is a place where optimization could be made, to make an exception for CompoundExpression, but this is probably not easy to implement.
Now, to illustrate the stack accumulation process which I schematically described above, let us limit the number of recursive calls:
Clear[n, f, ff, fff];
n = 0;
f[x_ /; x < 5] := (n += 1; f[x + 1]);
ff[x_] := Null /; (n += 1; False);
ff[x_ /; x < 5] := ff[x + 1];
fff[x_ /; x < 5] := ce[n += 1, fff[x + 1]];
Tracing the evaluation:
In[57]:= Trace[f[1],f]
Out[57]= {f[1],n+=1;f[1+1],{f[2],n+=1;f[2+1],{f[3],n+=1;f[3+1],{f[4],n+=1;f[4+1]}}}}
In[58]:= Trace[ff[1],ff]
Out[58]= {ff[1],ff[1+1],ff[2],ff[2+1],ff[3],ff[3+1],ff[4],ff[4+1],ff[5]}
In[59]:= Trace[fff[1],fff]
Out[59]= {fff[1],ce[n+=1,fff[1+1]],{fff[2],ce[n+=1,fff[2+1]],{fff[3],ce[n+=1,fff[3+1]],
{fff[4],ce[n+=1,fff[4+1]]}}}}
What you can see from this is that the expression stack accumulates for f
and fff
(the latter used just to show that this is a general mechanism, with ce[]
just some arbitrary head), but not for ff
, because, for the purposes of pattern matching, the first definition for ff
is a rule tried but not matched, and the second definition rewrites ff[arg_]
in its entirety, and does not generate deeper sub-parts that need further rewriting. So, the bottom line seems that you should analyze your function and see if its recursive calls will grow the evaluated expression from the bottom or not. If yes, it is not tail-recursive as far as Mathematica is concerned.
My answer would not be complete without showing how to do the tail call optimization manually. As an example, let us consider recursive implementation of Select. We will work with Mathematica linked lists to make it reasonably efficient rather than a toy. Below is the code for the non tail-recursive implementation:
Clear[toLinkedList, test, selrecBad, sel, selrec, selTR]
toLinkedList[x_List] := Fold[{#2, #1} &, {}, Reverse[x]];
selrecBad[fst_?test, rest_List] := {fst,If[rest === {}, {}, selrecBad @@ rest]};
selrecBad[fst_, rest_List] := If[rest === {}, {}, selrecBad @@ rest];
sel[x_List, testF_] := Block[{test = testF}, Flatten[selrecBad @@ toLinkedList[x]]]
The reason I use Block and selrecBad is to make it easier to use Trace. Now, this blows the stack on my machine:
Block[{$RecursionLimit = Infinity}, sel[Range[300000], EvenQ]] // Short // Timing
You can trace on small lists to see why:
In[7]:= Trace[sel[Range[5],OddQ],selrecBad]
Out[7]= {{{selrecBad[1,{2,{3,{4,{5,{}}}}}],{1,If[{2,{3,{4,{5,{}}}}}==={},{},selrecBad@@{2,{3,{4,
{5,{}}}}}]},{selrecBad[2,{3,{4,{5,{}}}}],If[{3,{4,{5,{}}}}==={},{},selrecBad@@{3,{4,{5,
{}}}}],selrecBad[3,{4,{5,{}}}],{3,If[{4,{5,{}}}==={},{},selrecBad@@{4,{5,{}}}]},{selrecBad[4,
{5,{}}],If[{5,{}}==={},{},selrecBad@@{5,{}}],selrecBad[5,{}],{5,If[{}==={},{},selrecBad@@{}]}}}}}}
What happens is that the result gets accumulated deeper and deeper in the list. The solution is to not grow the depth of the resulting expression, and one way to achieve that is to make selrecBad accept one extra parameter, which is the (linked) list of accumulated results:
selrec[{fst_?test, rest_List}, accum_List] :=
If[rest === {}, {accum, fst}, selrec[rest, {accum, fst}]];
selrec[{fst_, rest_List}, accum_List] :=
If[rest === {}, accum, selrec[rest, accum]]
And modify the main function accordingly:
selTR[x_List, testF_] := Block[{test = testF}, Flatten[selrec[toLinkedList[x], {}]]]
This will pass our power test just fine:
In[14]:= Block[{$IterationLimit= Infinity},selTR[Range[300000],EvenQ]]//Short//Timing
Out[14]= {0.813,{2,4,6,8,10,12,14,16,18,20,
<<149981>>,299984,299986,299988,299990,299992,299994,299996,299998,300000}}
(note that here we had to modify $IterationLimit, which is a good sign). And using Trace reveals the reason:
In[15]:= Trace[selTR[Range[5],OddQ],selrec]
Out[15]= {{{selrec[{1,{2,{3,{4,{5,{}}}}}},{}],If[{2,{3,{4,{5,{}}}}}==={},{{},1},selrec[{2,{3,{4,
{5,{}}}}},{{},1}]],selrec[{2,{3,{4,{5,{}}}}},{{},1}],If[{3,{4,{5,{}}}}==={},{{},1},selrec[{3,
{4,{5,{}}}},{{},1}]],selrec[{3,{4,{5,{}}}},{{},1}],If[{4,{5,{}}}==={},{{{},1},3},selrec[{4,
{5,{}}},{{{},1},3}]],selrec[{4,{5,{}}},{{{},1},3}],If[{5,{}}==={},{{{},1},3},selrec[{5,
{}},{{{},1},3}]],selrec[{5,{}},{{{},1},3}],If[{}==={},{{{{},1},3},5},selrec[{},{{{{},1},3},5}]]}}}
which is, this version does not accumulate the depth of the intermediate expression, since the results are kept in a separate list.
Null /;
condition mysterious. Would you be so kind as to explain the desired effect and how it achieves that effect? What happens if you leave it off? – Reb.Cabinn
to count the number of calls. In the first example, incrementingn
directly blocked the tail call optimization. So, I movedn+=1
into another definition -- but as a part of an always-false condition so that the definition is essentially a no-op (albeit with the side-effect of updatingn
). I wroteNull
as an arbitrary choice since the return value is never used. It is possible to achieve the same result in a single definition:f[x_] := f[x + 1] /; (++n; True)
. – WReachFold
for iteration. I presumeFold
does the equivalent of tail-calling, i.e., replacing the values of accumulation variables instead of accumulating them. My presumption is bolstered by the fact that if I useFoldList
or if ISow
out the intermediate results, I run out of memory. It definitely does something different viaFold
. – Reb.CabinFold
is built into the kernel, my suspicion is that it is implemented as a simple imperative loop written in C -- especially since C compilers rarely perform tail call optimization. But if it is implemented recursively, I agree that it must be optimized.FoldList
does not overrun the stack but rather runs out of memory because it keeps a copy of the results from every iteration. Which is probably a lot of data long before 96 hours :) – WReach