I banged out an example for you that functions, it is a TI part it is cortex-m4 based, but not the same chip/board you have I don't have that board/chip handy. Doesn't mean the peripherals are the same with your TI part, but the cortex-m4 handling should be the same.
Mine is the MSP432P401R launchpad. As you should know you want the datasheet for the launchpad, you want the datasheet for the MCU, the technical reference manual for the MCU, the ARM cortex-m4 technical reference manual and the ARMv7-m architectural reference manual before you start.
The code below is completely stand-alone all you need to add is a gnu toolchain from the last 10 years or so for ARM. Completely removes any other interference from other code. Every line of code you add adds risk. If you can get it to work at this level then you know you understand the cpu and peripheral enough to move forward, then adding it to a larger project or adding it to something using a library you add risk with that other code, and can at least have a warm fuzzy feeling that you know this peripheral and how you are using it so if things don't work then you either ported the stand-alone experiment wrong or something in the larger program is interfering.
I use openocd to talk to this part, at the time I first got this board however many years ago (can you even get this board any more?) flashing without their sandbox involved me making my own programs to do that. If the (user application) flash is erased then the built in bootloader runs which changes the clocks and other things. So I have programmed the flash to have basically an infinite loop program it turns off the WDT and sits in an infinite loop. So now I can do development in sram using openocd, quite easily
reset halt
load_image notmain.sram.elf
resume 0x01000000
and repeat those three lines each time I want to try another experiment.
I tend to start with an led blinker, use the systick or a timer with the led to determine/confirm the internal clock rate, then move on to the uart, where I have a simple routine that prints hex numbers its a dozen or so lines of code, not horribly massive like printf, does everything I need. When diving into interrupts which no matter how many decades of experience you have are an advanced topic. Ideally you need a way to visualize what is going on. LED in a pinch, but uart is far better. You want to start with the peripheral standalone if possible, polling. In this case I am using TIMER32 number 1. TI's style is to have the memory space addresses in the datasheet then how to use them in the reference manual. TI has both a raw interrupt status register and a masked interrupt status register.
Starting with the mask disabled learn the timer and the interrupt and how to clear it polling the RIS register.
Once you have mastered that then enable the interrupt, insuring that you have not enabled it into the core of the processor in any way, and see both the masked interrupt status in my case as well as bit 22 in ICSR ISRPENDING gets set. Confirming that you have enabled the interrupt into the ARM core from the chip vendors logic.
TI's style is to also have the interrupt table list in the datasheet. For the timer I am using I see:
INTISR[25] Timer32_INT1
So next I spam the NVIC_ISER0, turning all the bits on (this is a targetted test, nothing else should be going on in the chip). I have executed cpsid I to keep the interrupts out of the core.
Then I examine the ICSR after the interrupt and in my case the VECTPENDING field is 0x29 or 41 which is 16+15. That matches the datasheet. If I now change NVID_ISER0 to 1<<25 only and repeat, same answer VECTPENDING is 0x29. Can now move forward.
Here is where you have choices and have to master your tools. I went ahead and skipped using the power on VTOR of 0x00000000 and the vector table in flash and moved to sram which is your desire and also that is how I am developing. First from the arm documentation you see that VTOR has to be aligned. I went ahead and set it to the beginning of sram 0x01000000, and setup my entry code (sram style not flash style) to resemble a vector table but without the stack pointer init value, that takes us into the example:
sram.s
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word loop /*0x0004 1 Reset */
.word loop /*0x0008 2 NMI */
.word loop /*0x000C 3 HardFault */
.word loop /*0x0010 4 MemManage */
.word loop /*0x0014 5 BusFault */
.word loop /*0x0018 6 UsageFault */
.word loop /*0x001C 7 Reserved */
.word loop /*0x0020 8 Reserved */
.word loop /*0x0024 9 Reserved */
.word loop /*0x0028 10 Reserved */
.word loop /*0x002C 11 SVCall */
.word loop /*0x0030 12 DebugMonitor */
.word loop /*0x0034 13 Reserved */
.word loop /*0x0038 14 PendSV */
.word loop /*0x003C 15 SysTick */
.word loop /*0x0040 16 External interrupt 0 */
.word loop /*0x0044 17 External interrupt 1 */
.word loop /*0x0048 18 External interrupt 2 */
.word loop /*0x004C 19 External interrupt 3 */
.word loop /*0x0050 20 External interrupt 4 */
.word loop /*0x0054 21 External interrupt 5 */
.word loop /*0x0058 22 External interrupt 6 */
.word loop /*0x005C 23 External interrupt 7 */
.word loop /*0x0060 24 External interrupt 8 */
.word loop /*0x0064 25 External interrupt 9 */
.word loop /*0x0068 26 External interrupt 10 */
.word loop /*0x006C 27 External interrupt 11 */
.word loop /*0x0070 28 External interrupt 12 */
.word loop /*0x0074 29 External interrupt 13 */
.word loop /*0x0078 30 External interrupt 14 */
.word loop /*0x007C 31 External interrupt 15 */
.word loop /*0x0080 32 External interrupt 16 */
.word loop /*0x0084 33 External interrupt 17 */
.word loop /*0x0088 34 External interrupt 18 */
.word loop /*0x008C 35 External interrupt 19 */
.word loop /*0x0090 36 External interrupt 20 */
.word loop /*0x0094 37 External interrupt 21 */
.word loop /*0x0098 38 External interrupt 22 */
.word loop /*0x009C 39 External interrupt 23 */
.word loop /*0x00A0 40 External interrupt 24 */
.word timer32_handler /*0x00A4 41 External interrupt 25 */
.word loop /*0x00A8 42 External interrupt 26 */
.word loop /*0x00AC 43 External interrupt 27 */
.word loop /*0x00B0 44 External interrupt 28 */
.word loop /*0x00B4 45 External interrupt 29 */
.word loop /*0x00B8 46 External interrupt 30 */
.word loop /*0x00BC 47 External interrupt 31 */
.word loop /*0x00C0 48 External interrupt 32 */
reset:
cpsid i
ldr r0,stacktop
mov sp,r0
bl notmain
b loop
.thumb_func
loop: b .
.align
stacktop: .word 0x20008000
.thumb_func
.globl ienable
ienable:
cpsie i
bx lr
.thumb_func
.globl PUT8
PUT8:
strb r1,[r0]
bx lr
.thumb_func
.globl GET8
GET8:
ldrb r0,[r0]
bx lr
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl GET16
GET16:
ldrh r0,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl get_addr
get_addr:
ldr r0,=timer32_handler
bx lr
Your title question said assembly I am using mixed C/asm to make it easier to read/use. you can certainly do yours all in asm if you like, mine is not meant to be a library but a reference to see if you are doing the same things.
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void PUT8 ( unsigned int, unsigned int );
unsigned int GET8 ( unsigned int );
void PUT16 ( unsigned int, unsigned int );
unsigned int GET16 ( unsigned int );
void ienable ( void );
#define PORT_BASE 0x40004C00
#define PAOUT_L (PORT_BASE+0x02)
#define PADIR_L (PORT_BASE+0x04)
#define WDTCTL 0x4000480C
#define TIMER32_BASE 0x4000C000
#define ICSR 0xE000ED04
#define SCR 0xE000ED10
#define VTOR 0xE000ED08
#define NVIC_ISER0 0xE000E100
#define NVIC_IABR0 0xE000E300
#define NVIC_ICPR0 0xE000E280
volatile unsigned int ticks;
void timer32_handler ( void )
{
ticks^=1;
PUT8(PAOUT_L,ticks);
PUT32(TIMER32_BASE+0x0C,0);
PUT32(NVIC_ICPR0,1<<25);
}
void notmain ( void )
{
PUT16(WDTCTL,0x5A84);
PUT8(PADIR_L,GET8(PADIR_L)|0x01);
ticks=0;
PUT32(VTOR,0x01000000);
PUT32(NVIC_ISER0,1<<25);
ienable();
PUT32(TIMER32_BASE+0x08,0xA4);
}
sram.ld
MEMORY
{
ram : ORIGIN = 0x01000000, LENGTH = 0x3000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
}
and that's it 100% of the source code for this example, all you need to do is build it:
arm-none-eabi-as --warn sram.s -o sram.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m4 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -T sram.ld sram.o notmain.o -o notmain.sram.elf
arm-none-eabi-objdump -D notmain.sram.elf > notmain.sram.list
arm-none-eabi-objcopy notmain.sram.elf notmain.sram.bin -O binary
Any of the gnu gcc/binutils cross compilers from the last decade or so should work, the arm-none-eabi style as well as the arm-whatever-linux style, this code isnt affected by the difference.
The architectural reference manual shows that the first entry in the vector table is the stack pointer initialization value you can choose to use that or not, but it is offset 0x0000. Then the exceptions start exception 1 is reset, 2 is NMI and so on. exception 16 is where external (to the arm core) interrupt 0 starts and down the line, so interrupt 25 lands here
.word timer32_handler /*0x00A4 41 External interrupt 25 */
at offset 0xA4 in the vector table. If you are desperate or the chip isnt well documented then either between the pending status or simply spamming the vector table with all entries pointing at the handler you can narrow in on the offset/interrupt number. (light an led or something when the interrupt comes then go into an infinite loop, a horrible handler for real world stuff but just fine for reverse engineering a poorly documented part).
before you execute anything confirm you built things right, the entry point should be the code you expect, in this case being sram I have the entry point as instructions (that jump over my soon to be vector table when I change VTOR)
Disassembly of section .text:
01000000 <_start>:
1000000: e060 b.n 10000c4 <reset>
1000002: 46c0 nop ; (mov r8, r8)
1000004: 010000d1 ldrdeq r0, [r0, -r1]
1000008: 010000d1 ldrdeq r0, [r0, -r1]
100000c: 010000d1 ldrdeq r0, [r0, -r1]
1000010: 010000d1 ldrdeq r0, [r0, -r1]
1000014: 010000d1 ldrdeq r0, [r0, -r1]
1000018: 010000d1 ldrdeq r0, [r0, -r1]
100001c: 010000d1 ldrdeq r0, [r0, -r1]
1000020: 010000d1 ldrdeq r0, [r0, -r1]
1000024: 010000d1 ldrdeq r0, [r0, -r1]
1000028: 010000d1 ldrdeq r0, [r0, -r1]
100002c: 010000d1 ldrdeq r0, [r0, -r1]
1000030: 010000d1 ldrdeq r0, [r0, -r1]
1000034: 010000d1 ldrdeq r0, [r0, -r1]
1000038: 010000d1 ldrdeq r0, [r0, -r1]
100003c: 010000d1 ldrdeq r0, [r0, -r1]
1000040: 010000d1 ldrdeq r0, [r0, -r1]
1000044: 010000d1 ldrdeq r0, [r0, -r1]
1000048: 010000d1 ldrdeq r0, [r0, -r1]
100004c: 010000d1 ldrdeq r0, [r0, -r1]
1000050: 010000d1 ldrdeq r0, [r0, -r1]
1000054: 010000d1 ldrdeq r0, [r0, -r1]
1000058: 010000d1 ldrdeq r0, [r0, -r1]
100005c: 010000d1 ldrdeq r0, [r0, -r1]
1000060: 010000d1 ldrdeq r0, [r0, -r1]
1000064: 010000d1 ldrdeq r0, [r0, -r1]
1000068: 010000d1 ldrdeq r0, [r0, -r1]
100006c: 010000d1 ldrdeq r0, [r0, -r1]
1000070: 010000d1 ldrdeq r0, [r0, -r1]
1000074: 010000d1 ldrdeq r0, [r0, -r1]
1000078: 010000d1 ldrdeq r0, [r0, -r1]
100007c: 010000d1 ldrdeq r0, [r0, -r1]
1000080: 010000d1 ldrdeq r0, [r0, -r1]
1000084: 010000d1 ldrdeq r0, [r0, -r1]
1000088: 010000d1 ldrdeq r0, [r0, -r1]
100008c: 010000d1 ldrdeq r0, [r0, -r1]
1000090: 010000d1 ldrdeq r0, [r0, -r1]
1000094: 010000d1 ldrdeq r0, [r0, -r1]
1000098: 010000d1 ldrdeq r0, [r0, -r1]
100009c: 010000d1 ldrdeq r0, [r0, -r1]
10000a0: 010000d1 ldrdeq r0, [r0, -r1]
10000a4: 010000fd strdeq r0, [r0, -sp]
10000a8: 010000d1 ldrdeq r0, [r0, -r1]
10000ac: 010000d1 ldrdeq r0, [r0, -r1]
10000b0: 010000d1 ldrdeq r0, [r0, -r1]
10000b4: 010000d1 ldrdeq r0, [r0, -r1]
10000b8: 010000d1 ldrdeq r0, [r0, -r1]
10000bc: 010000d1 ldrdeq r0, [r0, -r1]
10000c0: 010000d1 ldrdeq r0, [r0, -r1]
010000c4 <reset>:
10000c4: b672 cpsid i
10000c6: 4803 ldr r0, [pc, #12] ; (10000d4 <stacktop>)
10000c8: 4685 mov sp, r0
10000ca: f000 f835 bl 1000138 <notmain>
10000ce: e7ff b.n 10000d0 <loop>
010000d0 <loop>:
10000d0: e7fe b.n 10000d0 <loop>
all the entries are the address of the handler ORRed with 1 as required.
In gnu assembler notice to get loop to work properly you need to preceed the lable with .thumb_func to tell the tool the next label is a function (so set the lsbit when I ask for its address)
.thumb_func
loop: b .
Without the .thumb_func there the address would be wrong and the handler would not get called another exception would happen again and if that handler address is wrong it is really game over.
If you want to manually build the table, understand that at the time this answer was written there is a pending bug at gnu showing that ADR does not work right, it is a pseudo instruction and poorly documented in the architectural reference manual so it is up to the assembler which defines the assembly language (assembly is defined by the tool, not the target nor architecture, the machine code is defined by the architecture, assembly language is a free for all). In the case of gnu assembler the documentation claims that when interwork is set it will provide an address with the lsbit set so that a bx rd can be used, but that is false for foreward referenced labels. Other assemblers may use ADR however they wish and you should check their definition. When in doubt ORR the lsbit if you feel the need to use ADR (don't add, or), I would certainly avoid the instruction all together, for example:
.thumb_func
.globl get_addr
get_addr:
ldr r0,=timer32_handler
bx lr
010000f4 <get_addr>:
10000f4: 4800 ldr r0, [pc, #0] ; (10000f8 <get_addr+0x4>)
10000f6: 4770 bx lr
10000f8: 010000fd strdeq r0, [r0, -sp]
which worked great (note this is disassembly, the strdeq is just the disassembler trying to make sense of the value 010000fd which is what you should focus on, the tools did the work for me providing the address in the correct form that I needed. Still relying on the tools and knowing/hoping they work but using something that has/does work with at least gas/binutils.
Notice for safety my boot strap starts by disabling interrupts. sets up the stack pointer and launches the C entry point. Since I have no .data nor require .bss to be zeroed the linker script and bootstrap are that trivial. I have multiple reasons for abstracting read/write access, you can do it your way (be careful that the popular ways are not necessarily C compliant and expect those habits/FADs to fail some day).
For these parts (TI in general it seems) early on you want disable the watch dog timer otherwise it resetting the part will drive you crazy trying to figure out what is going on.
My board has an led on it I set that port pin to be an output.
I have a variable that I use to keep track of interrupts so I can blink the led on/off every interrupt.
Since I let the tools do the work I set the VTOR to the beginning of sram, which is a properly aligned address.
I enable the interrupt in the NVIC
I enable interrupts to the core
I setup the peripheral and enable its interrupts.
Since I wrote the bootstrap and know it simply lands in an infinite loop when the C entry point function returns I can just return and leave the processor in that infinite loop waiting for interrupts and the interrupt handler to do the rest of the work.
In the handler I start from the peripheral toward the core, YMMV if you do it the other way, clearing the interrupt (after toggling the led).
That's it. Sounds like you are doing these steps, but since you have not provided the information required to see what you are really doing can only guess as to what step is missing or has the wrong value or is in the wrong place.
I cant emphasize enough that where possible in any chip/processor use polling as much as you can to experiment using targetted tests to figure out the peripheral and follow the interrupt through however many layers of interrupt gates there are, only enabling interrupts into the core after you have mastered as much as possible without actually causing the processor to interrupt. Doing it all at once makes the development take many times longer on average and is often significantly more painful.
My hope is this long answer triggers a simple three second fix to your code, if not you can at least try to develop from it a test for your chip. I have not posted the uart enabled version I used to discover how this part worked, but using that path it was pretty easy to figure the peripheral out, then walk the interrupt toward the core, have everything ready to create and clear interrupts then lastly enable the interrupt into the core and it worked first time (a bit of luck there, doesn't always happen that way).
EDIT
But if I do not reallocate the vector table into SRAM, how to I route the corresponding interrupt to its handler?
You simply add the label to the vector table
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hello
.word world
.thumb_func
reset: b .
.thumb_func
hello: b .
.thumb_func
world: b .
arm-none-eabi-as flash.s -o flash.o
arm-none-eabi-ld -Ttext=0 flash.o -o flash.elf
arm-none-eabi-objdump -D flash.elf
flash.elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000011 andeq r0, r0, r1, lsl r0
8: 00000013 andeq r0, r0, r3, lsl r0
c: 00000015 andeq r0, r0, r5, lsl r0
00000010 <reset>:
10: e7fe b.n 10 <reset>
00000012 <hello>:
12: e7fe b.n 12 <hello>
00000014 <world>:
14: e7fe b.n 14 <world>
No need to copy and modify the vector table everything is in place in flash.
I have to wonder why you don't know what your handlers are at build time and have to add things at runtime, this is an MCU. Maybe you have a generic bootloader? But in that case you wouldn't need to preserve any of the prior handlers. If you must move the table to sram and add an entry at runtime that is fine but you have to ensure that 1) VTOR is supported by the core and implementation of that core you are using 2) your entry is correct per the rules for this architecture.
Get either of those wrong and it wont work. Then of course there is the peripheral setup, the enabling of interrupts through gates to the core, enabling through the core to the processor and handling clearing the interrupt in the handler so it doesn't fire infinitely.