instead of "statically" avoiding it by using addresses compatible with the types in question.
It's very easy to statically (at compile time) avoid ever doing pointer math with non-multiple-of-4 offsets on int*
. Compilers don't have a hard time with this; in C you'd have to cast to uintptr_t
and back, or do other tricks, to misalign an int*
.
Checking for aligned addresses in HW is very cheap (just check that the low 3 bits are zero). Potentially faulting is also cheap if you only care about making the fast-path fast. Load/stores already have to be able to page-fault for virtual memory anyway.
This isn't a real problem that needed solving
The only downside of byte-addressable memory with alignment-required loads/stores is wasting those low address bits. (Or of course the inefficiency if you did have a problem where unaligned loads or stores would be useful, then not being able to do them is a problem whatever the cause.)
In real life yes sometimes we do have less alignment than we'd like, and detecting cases where your program runs slowly because of it can be useful. We have perf counters for that on modern x86. (And even an alignment-check flag, but that's nearly unusable because standard libraries and compilers assume it's not set.)
There are a some word-addressable machines, including some modern DSPs. But they still only have one "scale" for addresses, no part of the address space being byte-addressable.
Or is there a profound reason to prefer it this way (e.g. simplifying compilers)?
Yes, there is. If you want to efficiently zero an array before using it for an array of bool
or a string of char
, that zeroing can happen with 8-byte or even wider SIMD stores. Using the same address you'll use for byte accesses to the same data. That's just one example.
It's also not rare to want to use a wider load or store to copy around multiple elements of a struct
. Speaking of which, how would a struct work? Normally you have one address for the base of the struct and can access any of the members at fixed offsets from that. With your scheme, would you have to right-shift the address to undo the implicit left shift of qld
?
Also, a memory allocator like malloc
can use the same address regardless of how its caller wants to use the memory. Do you have to scale an address (by shifting) back to some standard scale to free
it?
Having multiple addresses for the same storage cells is "aliasing" and tends to cause problems.
Or are you picturing that byte address-space is totally disjoint from qword address-space? If so, how do you efficiently copy data from one to the other without being able to store more than 1 byte at a time? Maybe with SIMD loads/stores that are available for every address-space?
Or is the bottom 4GiB of memory byte-addressable (as well as word, dword, and qword), and above that the 4 to 8GiB range only addressable as 16-bit or wider chunks? And the next 8GiB above that is only usable as 32-bit chunks? (Top half of the dstr
address range). etc.
If so, do you have a separate malloc
allocator for each size, so if you need memory that can be accessed as bytes, it's limited to the low 4GiB (32-bit bstr
addresses = low 29 bits of 32-bit qstr
address-space?) Because you have some memory that's not accessible for use as bytes. The qstr
address space could be thought of as implicitly left-shifting by 3 to create a 35-bit byte address aligned by 8 bytes.
This sounds really hard to deal with, or at least doesn't fit the standard model that software is used to, and that's enshrined in languages like C. It would be more normal to use a mechanism like segmentation to expand your address-space uniformly so you could use any of it for strings, byte-arrays, and so on.
I guess virtual memory would work in terms of fixed-size physical pages still, and the OS could back any virtual page in the effectively 35-bit virtual address space with one of those 4k pages. But that means a page boundary for qstr is only 4096/8 = 512 qword addresses large.
(I'm assuming a register width of 32 bits for these examples. If you have 64-bit registers, you don't need any inconvenient tricks to expand the address space, you can just use byte addresses.)