Preventing unaligned accesses at the instruction set level

Question

Are there instruction sets in which unaligned accesses are prevented by using non-byte addresses?

As far as I know, most architectures use byte addresses everywhere, but penalize or throw exceptions for unaligned accesses.

Wouldn't it make sense to prevent this at the instruction level, e.g. a bstr for storing bytes uses byte addresses, dstr stores words using word addresses, qstr stores quads using quad addresses, etc? Thus no exceptions nor penalties, and as a bonus it may even increase the range of accessible memory space (otherwise lower bits are wasted).

From what I could find about x86, ARM, Alpha, Itanium, etc, they always use byte addressing, but require the user to ensure some instructions are only used with aligned addresses, leading to a runtime exception/penalty, instead of "statically" avoiding it by using addresses compatible with the types in question.

Did I miss something? Or is there a profound reason to prefer it this way (e.g. simplifying compilers)?

Wouldn't it make sense to prevent this at the instruction level, e.g. a bstr for storing bytes uses byte addresses, dstr stores words using word addresses ... No, because that would require the system to have multiple addressing modes. That would make things a lot more complex. — Andrew Henle
This used to be a popular approach until about the 1970s, but it is annoying to use, doesn't mesh well with high level languages and doesn't bring that much of an advantage anyway. — fuz
the compiler has a lot of things it has to get right to generate code that works, aligning addresses is a small part of that. You will see with say ARM and I would assume mips there are instructions that are word based and not byte based. Generally jumps and such to increase the reach of the immediate field in the instruction. Forcing word only addressing would make for more work on the compiler, it would make the code harder to read when disassembling/debugging, and would limit permitting unaligned accesses down the road after the initial creation of the instruction set. — old_timer
You wouldnt gain anything by putting byte/word/dword addresses in registers in order to perform some task. It would take more architecture design work in order to use them that way to gain further reach into memory (a 16 bit access being able to reach twice as much memory as an 8 bit). It would definitely cause more code to be generated as some address is byte based naturally and in order to use it for what you propose now you have to shift it before you can use it, creating more instructions. — old_timer
It is far easier to just have the compiler follow the architectures rules, which may include alignment. — old_timer

Peter Cordes Peter Cordes · Accepted Answer · 2019-10-28T09:45:34

instead of "statically" avoiding it by using addresses compatible with the types in question.

It's very easy to statically (at compile time) avoid ever doing pointer math with non-multiple-of-4 offsets on int*. Compilers don't have a hard time with this; in C you'd have to cast to uintptr_t and back, or do other tricks, to misalign an int*.

Checking for aligned addresses in HW is very cheap (just check that the low 3 bits are zero). Potentially faulting is also cheap if you only care about making the fast-path fast. Load/stores already have to be able to page-fault for virtual memory anyway.

This isn't a real problem that needed solving

The only downside of byte-addressable memory with alignment-required loads/stores is wasting those low address bits. (Or of course the inefficiency if you did have a problem where unaligned loads or stores would be useful, then not being able to do them is a problem whatever the cause.)

In real life yes sometimes we do have less alignment than we'd like, and detecting cases where your program runs slowly because of it can be useful. We have perf counters for that on modern x86. (And even an alignment-check flag, but that's nearly unusable because standard libraries and compilers assume it's not set.)

There are a some word-addressable machines, including some modern DSPs. But they still only have one "scale" for addresses, no part of the address space being byte-addressable.

Or is there a profound reason to prefer it this way (e.g. simplifying compilers)?

Yes, there is. If you want to efficiently zero an array before using it for an array of bool or a string of char, that zeroing can happen with 8-byte or even wider SIMD stores. Using the same address you'll use for byte accesses to the same data. That's just one example.

It's also not rare to want to use a wider load or store to copy around multiple elements of a struct. Speaking of which, how would a struct work? Normally you have one address for the base of the struct and can access any of the members at fixed offsets from that. With your scheme, would you have to right-shift the address to undo the implicit left shift of qld?

Also, a memory allocator like malloc can use the same address regardless of how its caller wants to use the memory. Do you have to scale an address (by shifting) back to some standard scale to free it?

Having multiple addresses for the same storage cells is "aliasing" and tends to cause problems.

Or are you picturing that byte address-space is totally disjoint from qword address-space? If so, how do you efficiently copy data from one to the other without being able to store more than 1 byte at a time? Maybe with SIMD loads/stores that are available for every address-space?

Or is the bottom 4GiB of memory byte-addressable (as well as word, dword, and qword), and above that the 4 to 8GiB range only addressable as 16-bit or wider chunks? And the next 8GiB above that is only usable as 32-bit chunks? (Top half of the dstr address range). etc.

If so, do you have a separate malloc allocator for each size, so if you need memory that can be accessed as bytes, it's limited to the low 4GiB (32-bit bstr addresses = low 29 bits of 32-bit qstr address-space?) Because you have some memory that's not accessible for use as bytes. The qstr address space could be thought of as implicitly left-shifting by 3 to create a 35-bit byte address aligned by 8 bytes.

This sounds really hard to deal with, or at least doesn't fit the standard model that software is used to, and that's enshrined in languages like C. It would be more normal to use a mechanism like segmentation to expand your address-space uniformly so you could use any of it for strings, byte-arrays, and so on.

I guess virtual memory would work in terms of fixed-size physical pages still, and the OS could back any virtual page in the effectively 35-bit virtual address space with one of those 4k pages. But that means a page boundary for qstr is only 4096/8 = 512 qword addresses large.

(I'm assuming a register width of 32 bits for these examples. If you have 64-bit registers, you don't need any inconvenient tricks to expand the address space, you can just use byte addresses.)

Preventing unaligned accesses at the instruction set level

1 Answers

This isn't a real problem that needed solving