There is no generic answer you have to look at each design separately. Also what that design has to say about endiannes. I dont see how you are trying to connect the dots between endianness and alignment. There are very popular architectures and looking at each of them in isolation, there are either no choices or a popular choice for endiannes, and completely indpendent from endianess within that architecture is its alignment rules.
x86 by definition is an 8 bit instruction set that started way back when with an 8 or 16 bit bus depending on which you bought or wired up, so by definition there is no alignment, and also by definition being a variable length instruction set varying in number of individual bytes it cant have alignment rules. And as a result of its history it doesnt have alignment rules for data either, further hurting its performance.
Take MIPS, unfortunately I dont know the traditional endianness I am guessing is big, but folks are calling it bi-endian, which is always something that should set off alarms. But here again endianess and alignment have no reason to be combined. MIPS as an educational concept and remains that, as well as physically built or at least cores you can buy for your own designs, was about performance, to the pain of the programmer and enforcing alignment fits nicely with that. Naturally instruction fetches and data reads would make sense to follow the same rules, the instruction set was/is 32 bit instructions and those are ideally aligned as well.
ARM, from the early days arm forced alignment but even with the ARM7TDMI you could disable that and despite what the ARM ARM said the behavior was predictable, just strange (rotate within the word rather than spill over into another word). Because of lazy programmers thanks to x86, they are being more tolerate of allowing unaligned transfers by disabling the fault trap and the result being what one would expect, by spilling over in to the next word. Here again listed as a bi-endian machine but the sane solution is go to little endian, the tools and everything make sense, their endianness changed from BE-32 to BE-8 in armv6 further causing big endian pain, just stay away. The exception is the strongarm which became the xscale which marvell I think bought (or was it cavium?) which defaulted to big endian (BE-32) and was a royal pain to get working tools but despite being able to run little those communities ran big. I want to remember that the arm designs require alignment for the instruction fetches, where data doesnt have to be if you disable the fault. And the instructions are always little endian independent of the big/little settings. they also have a 16 bit instruction set thumb and then thumb2 extensions which are variable length thumb and those do not have to be aligned, they are variable length 16 bit instructions instead of thinking of them as 32 bit instructions. The decoder has to inspect the first 16 bit instruction to understand the one that follows is connected. Just like an x86.
RISC leaned toward performance over CISC so RISC designs tend to have the alignment rules, but there is no reason why someone couldnt make a fully unaligned RISC or a fully aligned CISC. Dont let yourself fall into the trap of generalizing any of this, you have to look at each architecture and/or core separately, even within a vendor or instruction set (xscale vs ARM7TMDI).
Alignment always affects performance, today, yesterday and tomorrow on all systems. Sometimes the affect is smaller or larger but you cant magically grow silicon or wires at will on the fly in your design so you cant just change how the bus works and what can and cant fit in one clock cycle. So there is no new technology unless it is strictly limited to byte wide or bit wide busses, that can undo the alignment performance hits. And going back to 8 bit busses for the core interface is not faster, wider is faster on die. Off chip narrower is not faster but more manable so SATA wins over PATA. simply because we cant keep a lot of high speed signals parallel, have to serialize them (can have many separate serialized interfaces that work together, pci, ethernet). So with cpu core architectures alignment will always matter as we are using binary states and a fixed number of bits per bus.
bl
) only need to be aligned to 16 bit. – fuz