Assembly - Do instructions have to be aligned even in architectures that allow misaligned data?

Question

I am aware that some little-endian architectures like the Intel x86 allow misaligned data access. Of course, intuitively speaking, misalignment is not very clever as it could deteriorate performance (not necessarily for modern chips as defended here). So, unaligned data accesses are perhaps not good but they are legit in some architectures. Fine.

Until recently, I thought the same applied to unaligned instructions: if a given processor handles misalignment in data, what could prevent it from doing the same with code? However, in the Linda Null's book (here) it is stated pretty clearly the opposite:

Typically, the real-life compromise involves using two to three instruction lengths, which provides bit patterns that are easily distinguishable and simple to decode. The instruction length must also be compared to the word length on the machine. If the instruction length is exactly equal to the word length, the instructions align perfectly when stored in main memory. Instructions always need to be word aligned for addressing reasons. Therefore, instructions that are half, quarter, double or triple the actual word size can waste space. Variable length instructions are clearly not the same size and need to be word aligned, resulting in loss of space as well.

On the other hand, many books on Google books talks naturally about unaligned instructions. So, my questions are:

are unaligned instructions allowed in some architecture or not?
is alignment required for any big endian architectures? It is my understanding that misalignment is possible on little endian processors only.

Am I right on that?

I do hope you can help me to clarify these questions for me. Thank you very much, folks!!

This question is hard to answer as a platform where instructions can be unaligned can also be seen as a platform where the alignment requirement for instructions is 1. Alignment for instructions can be smaller than instruction length; for example, on ARM thumb, 32 bit long instructions (like bl) only need to be aligned to 16 bit. — fuz
Processors can handle unaligned instructions just like they do with data. Some RISC simply omits the lower bits in the operand of control transfer instructions because they have limited space. Also, the fetch unit can be different from (more complex) the load unit. Since the code is created for a specific architecture (while data is general), imposing constraints on it is reasonable and doesn't burden anyone too much. — Margaret Bloom
Architectures with variable length instructions such as x86 typically (always?) allow misaligned instructions, since otherwise they would effectively cancel the variable length advantages. That said, they still like stuff such as branch targets, especially loops, to be aligned. — Jester
To Jester and @MargaretBloom - That makes a lot of sense to me: there would be no sense in allowing variable length instructions and then banish them for alignment reasons. It sounds quite natural. However, I've seen people in forums saying that variable length instruction must be "normalized" (i.e., aligned) by using NOP instructions, which made me wonder: do x86 actually support unaligned instructions, or we (the compiler, actually) must "pad" memory with NOPs, therefore wasting space? — Humberto Fioravante Ferro

old_timer old_timer · Accepted Answer · 2017-02-10T15:33:31

There is no generic answer you have to look at each design separately. Also what that design has to say about endiannes. I dont see how you are trying to connect the dots between endianness and alignment. There are very popular architectures and looking at each of them in isolation, there are either no choices or a popular choice for endiannes, and completely indpendent from endianess within that architecture is its alignment rules.

x86 by definition is an 8 bit instruction set that started way back when with an 8 or 16 bit bus depending on which you bought or wired up, so by definition there is no alignment, and also by definition being a variable length instruction set varying in number of individual bytes it cant have alignment rules. And as a result of its history it doesnt have alignment rules for data either, further hurting its performance.

Take MIPS, unfortunately I dont know the traditional endianness I am guessing is big, but folks are calling it bi-endian, which is always something that should set off alarms. But here again endianess and alignment have no reason to be combined. MIPS as an educational concept and remains that, as well as physically built or at least cores you can buy for your own designs, was about performance, to the pain of the programmer and enforcing alignment fits nicely with that. Naturally instruction fetches and data reads would make sense to follow the same rules, the instruction set was/is 32 bit instructions and those are ideally aligned as well.

ARM, from the early days arm forced alignment but even with the ARM7TDMI you could disable that and despite what the ARM ARM said the behavior was predictable, just strange (rotate within the word rather than spill over into another word). Because of lazy programmers thanks to x86, they are being more tolerate of allowing unaligned transfers by disabling the fault trap and the result being what one would expect, by spilling over in to the next word. Here again listed as a bi-endian machine but the sane solution is go to little endian, the tools and everything make sense, their endianness changed from BE-32 to BE-8 in armv6 further causing big endian pain, just stay away. The exception is the strongarm which became the xscale which marvell I think bought (or was it cavium?) which defaulted to big endian (BE-32) and was a royal pain to get working tools but despite being able to run little those communities ran big. I want to remember that the arm designs require alignment for the instruction fetches, where data doesnt have to be if you disable the fault. And the instructions are always little endian independent of the big/little settings. they also have a 16 bit instruction set thumb and then thumb2 extensions which are variable length thumb and those do not have to be aligned, they are variable length 16 bit instructions instead of thinking of them as 32 bit instructions. The decoder has to inspect the first 16 bit instruction to understand the one that follows is connected. Just like an x86.

RISC leaned toward performance over CISC so RISC designs tend to have the alignment rules, but there is no reason why someone couldnt make a fully unaligned RISC or a fully aligned CISC. Dont let yourself fall into the trap of generalizing any of this, you have to look at each architecture and/or core separately, even within a vendor or instruction set (xscale vs ARM7TMDI).

Alignment always affects performance, today, yesterday and tomorrow on all systems. Sometimes the affect is smaller or larger but you cant magically grow silicon or wires at will on the fly in your design so you cant just change how the bus works and what can and cant fit in one clock cycle. So there is no new technology unless it is strictly limited to byte wide or bit wide busses, that can undo the alignment performance hits. And going back to 8 bit busses for the core interface is not faster, wider is faster on die. Off chip narrower is not faster but more manable so SATA wins over PATA. simply because we cant keep a lot of high speed signals parallel, have to serialize them (can have many separate serialized interfaces that work together, pci, ethernet). So with cpu core architectures alignment will always matter as we are using binary states and a fixed number of bits per bus.

Assembly - Do instructions have to be aligned even in architectures that allow misaligned data?

1 Answers