11
votes

I'm was messing around with tail-recursive functions in C++, and I've run into a bit of a snag with the g++ compiler.

The following code results in a stack overflow when numbers[] is over a couple hundred integers in size. Examining the assembly code generated by g++ for the following reveals that twoSum_Helper is executing a recursive call instruction to itself.

The question is which of the following is causing this?

  • A mistake in the following that I am overlooking which prevents tail-recursion.
  • A mistake with my usage of g++.
  • A flaw in the detection of tail-recursive functions within the g++ compiler.

I am compiling with g++ -O3 -Wall -fno-stack-protector test.c on Windows Vista x64 via MinGW with g++ 4.5.0.

struct result
{
    int i;
    int j;
    bool found;
};

struct result gen_Result(int i, int j, bool found)
{
    struct result r;
    r.i = i;
    r.j = j;
    r.found = found;
    return r;
}

// Return 2 indexes from numbers that sum up to target.
struct result twoSum_Helper(int numbers[], int size, int target, int i, int j)
{
    if (numbers[i] + numbers[j] == target)
        return gen_Result(i, j, true);
    if (i >= (size - 1))
        return gen_Result(i, j, false);
    if (j >= size)
        return twoSum_Helper(numbers, size, target, i + 1, i + 2);
    else
        return twoSum_Helper(numbers, size, target, i, j + 1);
}
8
Did you already try to do the conditional increments separately and do the recursive call only once with the incremented parameters? It is less nice than your example, but it might shed some light on your problem.stefaanv
@stefaanv Yes, to no avail. It appears that the call is occuring on the else statement, but no amount of tweaking will cause it to use a jmp instead of a call.Swiss
Does it work if you use a single statement ala return twoSum_Helper(numbers, size, target, i + j_ge_size, j_ge_size ? i + 2 : j + 1) where j_ge_size is bool j >= size? (suit yourself re implicit conversion from bool).Tony Delroy
@Tony Assembly still has a call to itself for what equates to the else clause above.Swiss
Tom makes an interesting observation in stackoverflow.com/questions/34125 - his tail recursion needed the function to be static...?Tony Delroy

8 Answers

4
votes

Tail call optimization in C or C++ is extremely limited, and pretty much a lost cause. The reason is that there generally is no safe way to tail-call from a function that passes a pointer or reference to any local variable (as an argument to the call in question, or in fact any other call in the same function) -- which of course is happening all over the place in C/C++ land, and is almost impossible to live without.

The problem you are seeing is probably related: GCC likely compiles returning a struct by actually passing the address of a hidden variable allocated on the caller's stack into which the callee copies it -- which makes it fall into the above scenario.

2
votes

Try compilling with -O2 instead of -O3.

How do I check if gcc is performing tail-recursion optimization?

well, it doesn't work with O2 anyway. The only thing that seems to work is returning the result object into a reference that is given as a parameter.

but really, it's much easier to just remove the Tail call and use a loop instead. TCO is here to optimize tail call that are found when inlining or when performing agressive unrolling, but you shouldn't attempt to use recursion when handling large values anyway.

1
votes

I can't get g++ 4.4.0 (under mingw) to perform tail recursion, even on this simple function:

static void f (int x)
  {
  if (x == 0) return ;
  printf ("%p\n", &x) ; // or cout in C++, if you prefer
  f (x - 1) ;
  }

I've tried -O3, -O2, -fno-stack-protector, C and C++ variants. No tail recursion.

0
votes

I would look at 2 things.

  1. The return call in the if statement is going to have a branch target for the else in the stack frame for the current run of the function that needs to be resolved post call (which would mean any TCO attempt would not be able overwrite the executing stack frame thus negating the TCO)

  2. The numbers[] array argument is a variable length data structure which could also prevent TCO because in TCO the same stack frame is used in one way or another. If the call is self referencing (like yours) then it will overwrite the stack defined variables (or locally defined) with the values/references of the new call. If the tail call is to another function then it will overwrite the entire stack frame with the new function (in a case where TCO can be done in A => B => C, TCO could make this look like A => C in memory during execution). I would try a pointer.

It has been a couple months since I have built anything in C++ so I didn't run any tests, but I think one/both of those are preventing the optimization.

0
votes

Try changing your code to:

// Return 2 indexes from numbers that sum up to target.
struct result twoSum_Helper(int numbers[], int size, int target, int i, int j)
{
    if (numbers[i] + numbers[j] == target)
        return gen_Result(i, j, true);
    if (i >= (size - 1))
        return gen_Result(i, j, false);

    if(j >= size)
        i++; //call by value, changing i here does not matter
    return twoSum_Helper(numbers, size, target, i, i + 1);
}

edit: removed unnecessary parameter as per comment from asker

// Return 2 indexes from numbers that sum up to target.
struct result twoSum_Helper(int numbers[], int size, int target, int i)
{
    if (numbers[i] + numbers[i+1] == target || i >= (size - 1))
        return gen_Result(i, i+1, true);

    if(i+1 >= size)
        i++; //call by value, changing i here does not matter
    return twoSum_Helper(numbers, size, target, i);
}
0
votes

Support of Tail Call Optimization (TCO) is limited in C/C++.

So, if the code relies on TCO to avoid stack overflow it may be better to rewrite it with a loop. Otherwise some auto test is needed to be sure that the code is optimized.

Typically TCO may be suppressed by:

  • passing pointers to objects on stack of recursive function to external functions (in case of C++ also passing such object by reference);
  • local object with non-trivial destructor even if the tail recursion is valid (the destructor is called before the tail return statement), for example Why isn't g++ tail call optimizing while gcc is?

Here TCO is confused by returning structure by value. It can be fixed if the result of all recursive calls will be written to the same memory address allocated in other function twoSum (similarly to the answer https://stackoverflow.com/a/30090390/4023446 to Tail-recursion not happening)

struct result
{
    int i;
    int j;
    bool found;
};

struct result gen_Result(int i, int j, bool found)
{
    struct result r;
    r.i = i;
    r.j = j;
    r.found = found;
    return r;
}

struct result* twoSum_Helper(int numbers[], int size, int target,
    int i, int j, struct result* res_)
{
    if (i >= (size - 1)) {
        *res_ = gen_Result(i, j, false);
        return res_;
    }
    if (numbers[i] + numbers[j] == target) {
        *res_ = gen_Result(i, j, true);
        return res_;
    }
    if (j >= size)
        return twoSum_Helper(numbers, size, target, i + 1, i + 2, res_);
    else
        return twoSum_Helper(numbers, size, target, i, j + 1, res_);
}

// Return 2 indexes from numbers that sum up to target.
struct result twoSum(int numbers[], int size, int target)
{
    struct result r;
    return *twoSum_Helper(numbers, size, target, 0, 1, &r);
}

The value of res_ pointer is constant for all recursive calls of twoSum_Helper. It can be seen in the assembly output (the -S flag) that the twoSum_Helper tail recursion is optimized as a loop even with two recursive exit points.

Compile options: g++ -O2 -S (g++ version 4.7.2).

-2
votes

I have heard others complain, that tail recursion is only optimized with gcc and not g++. Could you try using gcc.

-3
votes

Since the code of twoSum_Helper is calling itself it shouldn't come as a surprise that the assembly shows exactly that happening. That's the whole point of a recursion :-) So this hasn't got anything to do with g++.

Every recursion creates a new stack frame, and stack space is limited by default. You can increase the stack size (don't know how to do that on Windows, on UNIX the ulimit command is used), but that only defers the crash.

The real solution is to get rid of the recursion. See for example this question and this question.