3
votes

I'm reading, in parallel, various books on computer architecture and I'm confused. Some book state that assembly instructions are just mnemonics for machine instructions, and each instruction corresponds to exactly one machine instruction. However, Tanenbaum's Structured Computer Organization puts assembly on the layer above the operating system, and seems to imply that assembly somehow uses the operating system (I haven't read the whole book yet...)

Which one is true? Are assembly instructions simply machine instructions? Can they be also be system calls which are interpreted by the OS to machine instructions? Can they be something else?

1
It depends on how you think about it. If you consider only user-mode programs, and think of system calls as opaque "magic" things, then sure, you could think of the assembly as relying on the OS. However, the machine language relies on the OS just as much as the assembly in that case. I do however want to note that assembly does not always map 1:1 with machine code. On some platforms, the same assembly could be assembled in multiple ways, though usually one is faster.Thomas Jager
@ThomasJager: thanks. Could you provide an example of how machine language could rely on the OS?blue_note
When the CPU executes the bytes 0F 05 (the assembly of syscall) on an x86_64 machine, it starts running OS code in a privileged mode.Thomas Jager
Depending on architecture, certain assembly instructions might have multiple machine code encodings and vica versa. Those are the exceptions though and it doesn't matter in everyday practice.Jester
Assemblers are known to perform conversions on some instructions. For instance an assembler that targets 16-bit code on the 8086 couldn't emit the instruction shl ax, 2 . On the 8086 you couldn't shift by more than 1 bit at a time so some assemblers would emit two shl instructions like shl ax, 1 shl ax, 1 (which is the same thing as shl ax, 2 on processors >= 80186 which supported the enhanced form.Michael Petch

1 Answers

6
votes

Mostly yes, one line of assembly corresponds to one CPU instruction. But there are some caveats.

Label definitions don't correspond to any instructions - they just mark up the memory so that you can refer to it elsewhere. Labels definitely don't correspond to instructions, even though under some assemblers they occupy separate lines.

Data directives like db 0x90 or .byte 0x90 manually assemble bytes into the output file. Using such directives in a region that will be reached by execution lets you manually encode instructions, or create bugs if you did that by accident.

Assemblers often support directives - lines that provide some guidance to the assembler itself. Those don't correspond to CPU instructions, and they can sometimes be mistaken for genuine commands.

Some assemblers support macros - think inline functions.


Some RISC assemblers, notably MIPS, have a notion of combined instructions - one line of assembly corresponds to a handful of instructions. (These are called pseudo-instructions.) Those are like built-in macros, provided by the assembler.

But depending on the operand, it might only need to assemble to 1 machine instruction. e.g. li $t0, 1 can assemble to ori $t0, $zero, 1 but li $t0, 0x55555555 needs both lui and ori (or addiu).

On ARM, ldr r0, =0x5555 can choose between a PC-relative load from a literal pool or a movw if assembling for an ARM CPU that supports movw with a 16-bit immediate. You wouldn't see ldr r0, =0x5555 in disassembly, you'd see whichever machine instruction(s) the assembler picked to implement it. (Editor's note: I'm not sure if any ARM assemblers will ever pick 2 instructions (movw + movk) for a wider constant for ldr reg, =value)


Do you count a procedure call as "multiple instructions per line"? There's CALL on Intel, BL on ARM. As far the CPU docs are concerned, those are single instructions. They're just branches that also store a return address somewhere.

But if you're debugging and stepping over function calls instead of into them, they invoke a procedure/function/subroutine that may contain arbitrarily many instructions. Same goes for syscalls: an instruction like syscall or svc #0 is basically a function call into the kernel.

Assembly programs can definitely consume services from the operating system. How do you think regular programs do that? Whatever a high level program can do, assembly can do also. The specifics vary though.