3
votes

I am studying the ARM instruction architecture, and I have read that instructions are stored word-aligned, so the least significant two bits of instruction addresses are always zero in ARM state.

Thumb and Thumb-2 instructions are either 16 or 32 bits long. Instructions are stored half-word aligned, so the least significant bit of instruction addresses is always zero in Thumb state.

In some of my studies with different microcontrollers like AVR While accessing Program memory, I was using the Least significant bit to distinguish between Higher byte or lower byte to be accessed. But that was regarding Data memory access.

In ARM the instructions are anyways 32 bit and hence should be fetch all bytes at once.

Why then, the last two bits to fetch a particular byte of the instruction (1 bit in Thumb Mode) and use of banks.

PS: If I were to fetch individual byte of a 4-byte long instruction, it would take 4 cycles which is very inefficient, so what is the purpose of having byte addressability, Is it because the new THUMB type instructions which are 16-bit wide but still occupy 32-bit space?

3
It's not really clear what you're looking for clarification on; your first two paragraphs are the answer to your question!Oliver Charlesworth
Why would they have banks in either mode ARM or THUMB to access individual byte of an 32/16 bit instruction. An instruction should be fetched as a whole word right?Haswell
If i were to fetch individual byte of a 4 byte long instruction, it would take 4 cycles which is very inefficient, so what is the purpose of having byte address-ability, Is it because the new THUMB type instructions which are 16 bit wide but still occupy 32 bit space?Haswell
Thumb instructions do not occupy 32 bits of space - that is the whole point of using them. In general terms, code space is byte addressable rather than half-word addressable for consistency with data space - some embedded or cache-equipped ARM designs may be built with a semi-Harvard architecture as an efficiency, but do not require special instructions for data access to code memory. Note that unaligned word access is usually prohibited for both.Chris Stratton
The purpose for having byte addressability is because there are accesses to memory for things other than fetching instructions, and byte-level addressability is often useful for those scenarios. Note that there are architectures that are word addressable for things other than 8-bit bytes, but they aren't as common as 8-it addressable machines (for one thing, POSIX requires 8-bit addressability, if I recall correctly). Also, there are some ARM architectures (such as the Cortex M3) that have a limited bit-level addressability.Michael Burr

3 Answers

5
votes

I think you are again mixing the Instruction access with Data access. As far as data access is concerned we may use the last two bits to fetch any byte among the 4 byte data.

But the concept of not using last two bits has nothing to do with accessing individual byte of a 32 bit instruction. As you said, accessing one byte at a time for instruction access is highly inefficient and is not permitted as well. So to enforce this rule ( of not accessing bytes at odd boundaries in instruction access) the last two bits will not be considered. The following diagram will explain this:

The addresses are 32 bit:

|--0x00000007--|--0x00000006--|--0x00000005--|--0x00000004--|

|--0x00000003--|--0x00000002--|--0x00000001--|--0x00000000--|

Focus on the last nible:

| 3-0011; 2-0010; 1-0001; 0-0000; |

| 7-0111; 6-0110; 5-0101; 4-0100; |

Now focus on the last two least significant bits. Our aim is not to allow an instruction to start at locations 1,2,3,5,6,7 So if you check the two LSB's they cannot be anything in 01,10,11. Only "00" as the 2 LSB's is allowed. Now since they are 00 it is as good as ignoring them when the address generated is in multiples of 4.

Hope you can visualize better.

2
votes

before thumb all arm instructions were 32 bit, 4 bytes, and lets dictate they have to be aligned so the lower two bits are always zero for the instruction addresses. Then thumb comes along, 16 bit instructions so the lower bit of the address is always zero. They added a nuance that when using the bx or blx to switch modes the lsbit is used to distinguish between thumb and arm. If the lsbit is a zero when fed to bx or blx then it stays in or switches to arm mode, if 1 it stays in or switches to thumb mode. Note that lsbit is stripped off the address when placed in the pc it is consumed. While running in either mode the pc lsbit is always zero and bit one is always zero in arm mode.

arm busses are typically 32 or 64 bits wide and it is not a variable word instruciton set like an x86, etc, it is with thumb2 now but, isnt quite the same. So you are not extracting individual bytes and then extracting more bytes to isolate instructions. (not that a modern variable length instruction set does it that inefficiently). So an arm may fetch something like 8 instructions at a time which would be 4 clock cycles (once the handshakes are over) on the 64 bit data bus. That is cache off of course, with the cache it is same or more than that. Each core/architecture is different in its fetches, the memory controller has to handle all the valid cycle types from one byte on any lane on up to the width of the bus.

I dont know what you mean by banks? As programmers we think in terms of byte based addresses as a byte is our smallest addressable item. When you get to the actual rams hardware folks start stripping off address bits they are not using so their lsbit may be different than ours. When you write a single byte some processor busses wont put the whole byte address on the bus they may only put the word or double word address on the bus (2 or 3 lsbits of zero) and then use a byte mask to tell which byte lanes contain new data and which byte lanes you have to preserve at the target.

The amba/axi bus cycles are described on the amba/axi bus documentation at arms website infocenter.arm.com it describes in detail how each transaction works. Not very complicated at all...

0
votes

Note that the question title is only true for a couple of specific architecture versions (ARMv3 and ARMv4, in 32-bit modes) - from ARMv4T, the LSB of branch addresses is used for ARM/Thumb interworking, as @dwelch has noted. On v6M and v7M an attempt to switch instruction sets is not ignored, and results in a fault.

Prior to v3 when the address space was only 26 bits and there was no dedicated CPSR, the bottom two bits of r15 were used to store the processor mode (with the flags in the top 6 bits) - a flag-setting write to r15 would update both the PC and the PSR bits.