4
votes

I'm trying to learn ARM assembly.

After writing this small Hello World program:

                .global _start

                .text
_start:         ldr     R1,=msgtxt      
                mov     R2,#13          
                mov     R0,#1           
                mov     R7,#4           
                svc     0               

                mov     R7,#1           
                svc     0               


                .data
msgtxt:         .ascii  "Hello World!\n"

                .end

I noticed I could remove the .text and .data directive, the program would work just as well.

I'm therefore curious : everything I read emphasized the fact that .text section is to be used for code and .data for data. But here, before my eyes, they seem to do nothing at all!

Therefore, if these are not used to hold code and data respectively, what is their true purpose?

2

2 Answers

5
votes

Those sorts of directives depend on what architecture you're building your program for, and they choose what memory section to assign to whatever code or data that follows. In the end, everything is just a string of bytes. After your program is assembled, the symbols/labels will be assigned different memory addresses according to what section they're in.

.text is generally allocated in a read-only memory section, most-suitable for code that isn't expected to change.

.data is typically a writable section of memory. I believe that it's quite common to put your string in .text right next to your code data if it isn't expected to change (or maybe the architecture has a similar read-only segment). I would say that the .data section is even avoided most of the time. Why? Because the .data section needs to be initialized—copied from the program binary into memory when the program starts. Most data that your program references can be read-only, and any memory that they need for operations is usually just allocated with the .bss segment, which allocates a section of uninitialized memory.

There are some advantages of mixing code and data in the same section, such as easy access to the address of the data with a relative offset from the PC register (address of the code being executed). Then of course there are the disadvantages, in that if you try to modify read-only memory, you'll end up with at the very least your actions ignored, and the program might trigger an exception and crash. All very architecture-specific, and the safest bet is to keep code in segments meant for code, and data/allocations in segments meant for data.

It's all very specific to what your program is targeting. For example, the Game Boy Advance had a 256KB "slow" memory region, a 32KB "fast" memory region, and then the read-only "ROM" region (the game cartridge data) which can be several megabytes, and assemblers used these memory sections:

.data or .iwram  -> Internal RAM (32KB)
.bss             -> Internal RAM uninitialized
.ewram           -> External RAM (256KB)
.sbss            -> External RAM uninitialized
.text or .rodata -> Read only ROM (cartridge size)

To give another example, the SPC-700 (SNES sound chip) had 64KB of readable and writable memory that was used for everything, but the first 256 bytes of it had faster access (the "zero page"). In this theoretical case, .data and .text would be assigned to the same memory region--that is, they would not be allocated in the zero-page, and they both share the same memory. There would be a custom segment for the zero-page, and the difference between .text and .data would be very little - just a way to distinguish which symbols in the assembled program point to "data" and which symbols point to program code.

3
votes

GAS (like most assemblers) defaults to the .text section, and your read-only data still works in .text

Everything is just bytes


You can do echo 'mov r1, #2' > foo.s and assemble+link that into an ARM binary (with
gcc -nostdlib -static foo.s for example). You can single-step that instruction in GDB.

(Without a sys_exit system call your program will crash after that, but of course you could do that too still without any directives.)

The linker will warn that it didn't find a _start symbol (because you left out the label itself, not to mention the .globl directive that told the assembler to make it visible in the object file's symbol table.

But GNU binutils ld's default is to use the start of the .text section as the ELF entry point.

Most sections other than .text aren't linked into executable memory by default, so having _start: in .data would normally be a problem.


Read-only data should normally go in the .rodata section, which is linked as part of the TEXT segment anyway. So as far as runtime behaviour is concerned, placing it at the end of the .text section (by leaving out .data) is pretty much exactly equivalent to what you should have done.

What's the difference of section and segment in ELF file format

Putting it in .data leads to the linker putting it in a different segment that tells the OS's ELF program loader to map it read+write (and not execute).

The point of having a .rodata section separate from .text is to group code together and data together. Many CPUs have split L1d and L1i caches, and/or separate TLBs for data / instructions, so fine-grained mixing of read-only data with code wastes space in split caches.

In your case, you're not linking any other file that also have some code and some data, so there's no difference.