First and foremost the ARM7TDMI does not support the thumb2 extentions, instead it basically defines the original thumb instruction set.
so why not just try it?
.thumb
@.syntax unified
b 0x50
run these commands
arm-whatever-whatever-as b.s -o b.o
arm-whatever-whatever-objdump -D b.o
get this output
0: e7fe b.n 50 <*ABS*0x50>
so that is a T2 encoding and as the newer docs show for this instruction that is supported by ARMv4T, ARMv5T*, ARMv6*, ARMv7 the ARM7TDMI is an ARMv4t
so we see that E7 matches the 11100 start of that instruction definition
so the imm11 is 0x7FE. which is basically an encoding of branch to the address 0x000 since this isnt linked with anything. how do I know that?
.thumb
b skip
nop
nop
nop
nop
nop
skip:
00000000 <skip-0xc>:
0: e004 b.n c <skip>
2: 46c0 nop ; (mov r8, r8)
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
0xe004 starts with 11100 so that is a branch encoding T2. imm11 is a 4
we need to reach from 0 to 0xC. the pc is two INSTRUCTIONS ahead when the offset is applied. The docs say
Encoding T2 Even numbers in the range –2048 to 2046
and
PC, the program counter
- When executing an ARM instruction, PC reads as the address of the current instruction plus 8. • When executing a
- Thumb instruction, PC reads as the address of the current instruction
plus 4.
so that all makes sense. 0xC-0x4 = 8. we can only do evens and it makes no sense to branch into the middle of an instruction anyway so divide by 2 because thumb instructions are two bytes (offset is in instructions not bytes). so that gives a 4
0xE004
here is one way to generate the t4 encoding
.thumb
.syntax unified
b skip
nop
nop
nop
nop
nop
skip:
00000000 <skip-0xe>:
0: f000 b805 b.w e <skip>
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
c: 46c0 nop ; (mov r8, r8)
T4 encoding of branch is 11110 on top of the first halfword indicating this is either an undefined instruction (anything not ARMv6T2, ARMv7) or a thumb2 extension for ARMv6T2, ARMv7
second halfword 10x1 and we see a B so looks good this is a thumb2 extended branch.
S is a 0 imm10 is 0 j1 is 1 j2 is 1 and imm11 is 5
I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);
1 EOR 0 is 1 right? not that you get 0. So I1 and I2 are both zeros the
s is a zero imm10 is a zero. so we are basically on this one only looking at imm11 as a positive number
the pc is four ahead when executing so so 0xE - 0x4 = 0xA.
0xA / 2 = 0x5 and that is our branch offset offset pc + (5*2)
.syntax unified
.thumb
b.w skip
nop
here:
nop
nop
nop
nop
skip:
b.w here
00000000 <here-0x6>:
0: f000 b805 b.w e <skip>
4: 46c0 nop ; (mov r8, r8)
00000006 <here>:
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
c: 46c0 nop ; (mov r8, r8)
0000000e <skip>:
e: f7ff bffa b.w 6 <here>
s is a 1, imm10 is 0x3FF j1 is 1 j2 is 1 imm1 is 0x7FA
1 eor 1 is 0 not that you get 1 for i1 and same for i2
imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);
s is a 1 so this will sign extend a 1 all but the last few bits are ones so the imm32 is 0xFFFFFFFA or -6 instructions back or -12 bytes back
so our offset is ((0xE + 4) - 6)/2 = 6 as well. or look at it another way
from the instruction encoding PC - (6*2) = (0xE + 4) - 12 = 6 branch to 0x6.
So if you wanted to branch to say 0x70 and the address of the instruction is 0x12 then your offset is 0x70-(0x12+4) = 0x62 or 0x31 instructions, we know from the skip the trick is to make s 0 and j1 and j2 a 1
0x12: 0xF000 0xB831 branch to 0x70
so now knowing that we can go back to this:
0: e7fe b.n 50 <*ABS*0x50>
the offset is a sign extended 0x7FE or 0xFFFFFFFE. 0xFFFFFFFE*2 + 4 =
0xFFFFFFFC + 4 = 0x00000000. Branch to 0
add a nop
.thumb
nop
b 0x50
00000000 <.text>:
0: 46c0 nop ; (mov r8, r8)
2: e7fe b.n 50 <*ABS*0x50>
same encoding
so the disassembly implies an absolute value of 0x50 but is not encoding it, linking doesnt help it just complains
(.text+0x0): relocation truncated to fit: R_ARM_THM_JUMP11 against `*ABS*0x50'
this
.thumb
nop
b 0x51
gives the same encoding.
So basically there is something wrong with this syntax and/or it is looking for a label named 0x50 perhaps?
I hope your example was you wanting to know the encoding of a branch to some address instead of that exact syntax.
arm is not like some other instruction sets, the branches are always relative. so if you can reach the destination based on the encoding then you get a branch, otherwise, you have to use a bx or pop or one of the other ways to modify the pc (with an absolute value).
knowing that the T2 encoding from the docs can only reach 2048 ahead, then put more than 2048 nops between the branch and its destination
b.s: Assembler messages:
b.s:5: Error: branch out of range
Maybe this is what you are looking to do?
.thumb
mov r0,#0x51
bx r0
00000000 <.text>:
0: 2051 movs r0, #81 ; 0x51
2: 4700 bx r0
branch to absolute address 0x50. for that specific address no need for thumb2 extensions.
.thumb
ldr r0,=0x12345679
bx r0
00000000 <.text>:
0: 4800 ldr r0, [pc, #0] ; (4 <.text+0x4>)
2: 4700 bx r0
4: 12345679 eorsne r5, r4, #126877696 ; 0x7900000
branch to address 0x12345678 or any other possible address.
imm32 = SignExtend(S:I1:I2:imm10:imm11:'0', 32);
, from which you can work backwards to a specific encoding. Or perhaps cheat and go look at the source of binutils, LLVM, or any other open-source assembler... – Notlikethat