0
votes

when I create ARM assembly code from C code with gcc -S, I get a variant of the LDR instruction that I don't know. Specifically, I get the "ldr r3, .L5" instruction where ".L5" is a lable defined by the compiler. It is not clear to me why I don't get the pseudoinstruction "ldr r3, =.L5", which should be the only way to load an arbitrary number in a register.

More in details:

  1. I start from this C code (file name: sum_squares_C.c):
int sum;

int main(){
    sum = 0;
    for(int i=1; i<=n; i++){
            sum = sum + i*i;
    }
}
  1. Then on a Raspeberry PI, I compile with "gcc -O0 -S sum_squares_C.c", with compiler version gcc (Raspbian 8.3.0-6+rpi1) 8.3.0.

  2. The output is this ARM code (the instruction "ldr r3, .L5" is in the 7th line after label "main"):

    .arch armv6
    .eabi_attribute 28, 1
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 6
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "sum_squares_C.c"
    .text
    .global n
    .data
    .align  2
    .type   n, %object
    .size   n, 4
n:
    .word   1
    .comm   sum,4,4
    .text
    .align  2
    .global main
    .arch armv6
    .syntax unified
    .arm
    .fpu vfp
    .type   main, %function
main:
    @ args = 0, pretend = 0, frame = 8
    @ frame_needed = 1, uses_anonymous_args = 0
    @ link register save eliminated.
    str fp, [sp, #-4]!
    add fp, sp, #0
    sub sp, sp, #12
    ldr r3, .L5
    mov r2, #0
    str r2, [r3]
    mov r3, #1
    str r3, [fp, #-8]
    b   .L2
.L3:
    ldr r3, [fp, #-8]
    ldr r2, [fp, #-8]
    mul r2, r2, r3
    ldr r3, .L5
    ldr r3, [r3]
    add r3, r2, r3
    ldr r2, .L5
    str r3, [r2]
    ldr r3, [fp, #-8]
    add r3, r3, #1
    str r3, [fp, #-8]
.L2:
    ldr r3, .L5+4
    ldr r3, [r3]
    ldr r2, [fp, #-8]
    cmp r2, r3
    ble .L3
    mov r3, #0
    mov r0, r3
    add sp, fp, #0
    @ sp needed
    ldr fp, [sp], #4
    bx  lr
.L6:
    .align  2
.L5:
    .word   sum
    .word   n
    .size   main, .-main
    .ident  "GCC: (Raspbian 8.3.0-6+rpi1) 8.3.0"
    .section    .note.GNU-stack,"",%progbits

It seems to me that gcc uses the instruction "ldr r3, .L5" as equivalent to "ldr r3, =.L5". Is it correct? Where can I find the definition of this instruction syntax? Is it possible to force gcc to not use this instruction, but use "ldr r3, =.L5" (I need this for teaching reasons)?

Thanks! Francesco

2
And which value is stored at .L5?user253751
It seems the address of the variable sum.Francesco
those are not equivalent one ldr r3,.L5 is put the value at address .L5 (labels are addresses) into r3, the other ldr r3,=.L5 is put the address of .L5 in r3. completely different. for the former the assembler will replace that with a pc relative load. for the latter the assembler will attempt to create a value in a nearby pool and create a pc relative load, the linker will then later put the address to .L5 in once it is knownold_timer
it is good/best to examine the disassembly first then if needed come back to the assembly. or at least compare the assembly and disassembly to each other, most of these kinds of questions will answer themselves.old_timer
you didnt define n did you? and if you optimize that then it is dead code, harder to read unoptimized code. if you were to return the sum but declare n inside the function and optimize gcc should simply calculate the result and return that rather than generate the loop, if you were to pass n in to a function as an argument then return the sum it should optimize to a simpler non-loop form but produce some code.old_timer

2 Answers

1
votes

ldr r3, .L5 loads a word from the address .L5 into r3. At the label .L5 there is the address of the variable sum. So this loads the address of sum into r3.

ldr r3, =.L5 loads the address of .L5 into r3. Then the program would need to dereference it again in order to get the address of sum. There is no reason to do this.

When you use ldr r3, =.L5 the assembler stores the address of .L5 somewhere, and then loads from that address. So this:

    ldr r3, =.L5
    ...
.L5:
    .word sum

is the same as this:

    ldr r3, .address_of_L5
    ...
.L5:
    .word sum
    ...
.address_of_L5:
    .word .L5

As you can see, the compiler has already done this for sum. Instead of writing this assembly:

    ldr r3, =sum

the compiler has written:

    ldr r3, .L5
    ...
.L5:
    .word sum

which is exactly what the assembler would have done anyway. I don't know why the compiler wants to do this instead of the assembler.

It is not clear to me why I don't get the pseudoinstruction "ldr r3, =.L5", which should be the only way to load an arbitrary number in a register.

Notice this is not the only way to load an arbitrary number into a register. It's not even a real way to load an arbitrary number into a register. It's a pseudoinstruction (as you know): it's not something the CPU can actually do, it's something that the assembler can "compile" for your convenience.

1
votes

to save typing and assume a risk a person might use

ldr r3,=sum
ldr r3,[r3]

as pointed out in the other example the assembler will create in machine code the equivalent of what the human could have typed without the =address trick

ldr r3,address_of_sum (without the =)
ldr r3,[r3]
...
address_of_sum: .word sum

and that first ldr (not pseudo as it translates directly into a known instruction, one to one) is a pc-relative load (assuming it can reach).

Both of these though are assembler specific as assembly language is defined by the assembler not the target.

the =addresss shortcut is not supported by all arm assemblers and should be used with care, for certain values it does not turn into a word in the pool with a pc relative load.

for questions like this first examine the disassembly, most of the time that will answer your question, even better examine the dissasembly first then in question the assembly. compiler generated assembly is not as easy to read and follow as a disassembly, esp when linked. It is also easier to learn from optimized code than unoptimized as so much of the code is this stack (or in this case global) variable stuff.

ldr r3,=0x1000
ldr r3,=0x1234
b .

00000000 <.text>:
   0:   e3a03a01    mov r3, #4096   ; 0x1000
   4:   e51f3000    ldr r3, [pc, #-0]   ; c <.text+0xc>
   8:   eafffffe    b   8 <.text+0x8>
   c:   00001234    andeq   r1, r0, r4, lsr r2

in one case where it can it generates a mov, where it cant then it allocates from the pool and places the value there then does a pc relative load. Now yes when reading the output this way you need to see/understand/ignore the andeq disassembly that line we are looking at the value 0x00001234 and seeing the instruction generated.

You should not always assume the =address trick will work if you choose to try various tools, it works for gnu now if it can find a pool if it cant then you either need to just do the typing yourself or add a .pool or whatever the other pseudocode that does the same thing is to help the assembler find a place for this value as needed.

I would expect an assembler to always place this (=address) in the pool for an external reference, but it is technically possible for a toolchain to put a placeholder there and let the linker fill it in either with a mov or add a nearby item and place the value there like binutils does with a bl to an external reference.

gas:

ldr r3,=sum
b .

00000000 <.text>:
   0:   e51f3000    ldr r3, [pc, #-0]   ; 8 <.text+0x8>
   4:   eafffffe    b   4 <.text+0x4>
   8:   00000000    andeq   r0, r0, r0

the linker will fill in the address later as with your compiler output. now the -0 disassembly is very interesting, almost amusing.