Adding to rici's answer (upvote rici not me)...
It gets even better, taking what you offered and wrapping it
extern unsigned int * GetGPU_Pointer ( unsigned int );
void fun ( unsigned int framebufferAddress )
{
unsigned int * gpuPointer = GetGPU_Pointer(framebufferAddress);
unsigned int color = 16;
int y = 768;
int x = 1024;
while(y >= 0)
{
while(x >= 0)
{
*gpuPointer = color;
color = color + 2;
x--;
}
color++;
y--;
x = 1024;
}
}
Optimizes to
00000000 <fun>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <GetGPU_Pointer>
8: e59f3008 ldr r3, [pc, #8] ; 18 <fun+0x18>
c: e5803000 str r3, [r0]
10: e8bd4008 pop {r3, lr}
14: e12fff1e bx lr
18: 00181110 andseq r1, r8, r0, lsl r1
because the code really doesnt do anything but that one store.
Now if you were to modify the pointer
while(x >= 0)
{
*gpuPointer = color;
gpuPointer++;
color = color + 2;
x--;
}
then you get the store you were looking for
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: ebfffffe bl 0 <GetGPU_Pointer>
8: e59f403c ldr r4, [pc, #60] ; 4c <fun+0x4c>
c: e1a02000 mov r2, r0
10: e3a0c010 mov ip, #16
14: e2820a01 add r0, r2, #4096 ; 0x1000
18: e2801004 add r1, r0, #4
1c: e1a0300c mov r3, ip
20: e4823004 str r3, [r2], #4
24: e1520001 cmp r2, r1
28: e2833002 add r3, r3, #2
2c: 1afffffb bne 20 <fun+0x20>
30: e28ccb02 add ip, ip, #2048 ; 0x800
34: e28cc003 add ip, ip, #3
38: e15c0004 cmp ip, r4
3c: e2802004 add r2, r0, #4
40: 1afffff3 bne 14 <fun+0x14>
44: e8bd4010 pop {r4, lr}
48: e12fff1e bx lr
4c: 00181113 andseq r1, r8, r3, lsl r1
or if you make it volatile (and then dont have to modify it)
volatile unsigned int * gpuPointer = GetGPU_Pointer(framebufferAddress);
then
00000000 <fun>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <GetGPU_Pointer>
8: e59fc02c ldr ip, [pc, #44] ; 3c <fun+0x3c>
c: e3a03010 mov r3, #16
10: e2831b02 add r1, r3, #2048 ; 0x800
14: e2812002 add r2, r1, #2
18: e5803000 str r3, [r0]
1c: e2833002 add r3, r3, #2
20: e1530002 cmp r3, r2
24: 1afffffb bne 18 <fun+0x18>
28: e2813003 add r3, r1, #3
2c: e153000c cmp r3, ip
30: 1afffff6 bne 10 <fun+0x10>
34: e8bd4008 pop {r3, lr}
38: e12fff1e bx lr
3c: 00181113 andseq r1, r8, r3, lsl r1
then you get your store
arm-none-eabi-gcc -O2 -c a.c -o a.o
arm-none-eabi-objdump -D a.o
arm-none-eabi-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The problem is, as written, you didnt tell the compiler to update the pointer more than the one time. So as in my first example it has no reason to even implement the loop, it can pre-compute the answer and write it one time. In order to force the compiler to implement the loop and write to the pointer more than one time, you either need to make it volatile and/or modify it, depends on what you were really needing to do.