6
votes

I have a recursive descent parser for an embedded ARM processor (in C + GCC, for ARM Cortex M3).

While running it I've noticed that it uses a massive amount of stack space (even more than you might expect) and under closer inspection I have found that this is happening:

extern int bar(int *p);

int foo() {
 int z = foo(); // it's an example!

 int n[100];  // stack usage
 return z+bar(n); // calling bar(n) stops n from being optimised out
}

Result of running arm-none-eabi-gcc -fomit-frame-pointer -S test.c

foo:
    str lr, [sp, #-4]!  ; Push link register
    sub sp, sp, #412    ; Reserve space on stack, even if we don't need it now!
    bl  foo             ; Recurse
    str r0, [sp, #404]  ; Store result
    ...

So at the start of the function, it pushes the entire stack frame onto the stack. However after a few iterations it's got loads of stuff on the stack that it hasn't used yet.

Ideally, what I'd like is for GCC to generate:

foo:
    str lr, [sp, #-4]!  ; Push link register
    ; Don't reserve space, because we don't need it
    bl  foo             ; Recurse
    sub sp, sp, #412    ; Reserve space now
    str r0, [sp, #404]  ; Store result
    ...

(This is probably not correct but I hope you get the idea)

Something a bit like this can be achieved with the following code, but it's really nasty (and if GCC inlines fooworker, it breaks again!). There must be a better way?

int fooworker(int z) {
 int n[100];  // stack usage
 return z+bar(n); // calling bar(n) stops n from being optimised out
}


int foo() {
 return fooworker(foo());
}

So is there a way of telling GCC to only enlarge the stack at the start of the basic block, or is there a 'barrier' statement that causes extra push/pop ops to be added at that point? I guess GCC is using one of the ARM standard call types - but is there a way to tag these functions with another call type that is a bit more efficient with the stack, or is there a way to rewrite the functions such that the stack is used a bit more sensibly?

Please don't tell me not to use recursion, it is not answering the question.

2
I tried that function now, it just saves r7 and lr, but I'm using a different toolchain, I stopped using codesourcy had too much problems with it - iabdalkader
Sorry, I changed the question a bit (I made a better code example) - have you used the latest code? After some testing I think most GCC versions will do the same thing. - Gordon Williams
Aniket: I changed the example, but is tail recursion not what is being done in the code at the end of the question? Ideally GCC would be able to have two different stack frames within the function - Gordon Williams
Whilst this is clearly suboptimal code, and deferring the allocation of space on the stack is fine, I wonder whether there detecting that it isn't becomes very difficult for non-trivial examples? - marko

2 Answers

3
votes
int *n = alloca(sizeof(*n) * 100);

It's ugly and I'd personally split up the function into two parts, but seems to work in my gcc on amd64 on all optimization levels.

0
votes

This is all susceptible to optimization, but you could also try to introduce a new scope:

extern int bar(int *p);

int foo() {
  int z = foo();

  {
    int n[100];
    return z+bar(n);
  }
}

The introduction of the new scope means that n should not live before foo() is called. Again, optimization could break all of this, like your own solution or the accepted one.