Bootloader Strategy for Corrupt Applications

Question

I've implemented a bootloader for a Kinetis ARM Cortex-M4 microcontroller.

The main application (starting at 0x10000) is re-programmed via the bootloader over a custom RS232 interface. I've implemented jumpToApplication and jumpToBootloader functions from the bootloader and application perspectives and all works fine so far.

One strategy I'm keen to understand is what to do upon the event of a corrupt main application?

The bootloader currently checks the stack-pointer and program-counter of the main application before deciding whether to jump. However, if the main application is corrupt then either two issues will occur:

The main application will hang and make it difficult to re-program
The microcontroller will reboot and will be stuck in a bootloader > application > bootloader (etc) loop

I have a SharedData structure which allows me to share data (via a fixed RAM location) between both the bootloader and application. I have considered adding a rebootCounter to this structure which would be incremented upon the HardFaultInterrupt being triggered in the main application.

This value could be tested in the bootloader and, depending on the counter value, a decision could be made as to whether to stay in the bootloader or try to launch the application.

Are there more "industry standard" ways of dealing with this?

UPDATE

To clarify, the ultimate reason for asking this question is to cover the following scenario:

Bootloader is programmed into the device during production phase via JTAG
Main application (latest build) is loaded during testing phase
During the testing phase, there is a power-cut or connection issue and the device is only partially programmed
When power is applied again, the bootloader will "assume" that there is a valid program in the main part of flash and will "jump" to this application
The microcontroller is now stuck in no mans land with no way of re-loading flash via the bootloader again without opening up the products enclosure and re-flashing the chip via JTAG - not something we can do when the product is in the field.

During the bootloader programming phase, the firmware is programmed and validated byte-by-byte to ensure that there is no corruption during the data transfer. If corruption occurs during this phase (bad packet due to USB hub issue, for example) then the bootloader will continue to accept re-programming commands.

UPDATE #2

The following post seems to be thinking along similar lines:

https://interrupt.memfault.com/blog/how-to-write-a-bootloader-from-scratch

The bootloader's only task should be to program the main application flash. Why would it need to communicate? It sounds as if you plan to execute both simultaneously... why would you check the "main application SP and PC"? If the bootloader finds that the main application flash is corrupt, then try to program it again. That's about it. — Lundin
"how do I determine if the main application flash is corrupt without actually running the main application" Ideally by comparing the flash byte by byte. Or if this isn't possible, a CRC32. I don't see how you can " get back to bootloader from the application", where does the bootloader store the application in the meantime? In another section of flash? Why would that flash be any better? And what are you actually trying to protect against - data retention? — Lundin
The flash programming sequence already performs a byte-byte validation when writing the application into flash (starting at address 0x10000) but what I am trying to cover is if the power is killed during programming and then when the board is powered back on - how do I stop the application from being executed because to all intents and purposes, it is corrupt. Somehow, I need to stay in the bootloader to allow further re-programming. I am not covering against bad code - this isn't my question. — weblar83
Then - transfer a checksum computed over the application bytes. Have your bootloader compute the checksum from programmed data. Compare them. But anyway, you should have one byte in flash where you store bootloader "state" itself - ie. programming or like normal. So that once it enters "programming" state, it will not enter application after power down, just like you described. — KamilCuk
@KamilCuk yes, I think this is the solution. I think storing the checksum at the very top of flash may also be worthwhile so that I can validate this through reboots and power cycles. If this checksum is erased at the start of the firmware update then stored at the end (once the BL has compared and verified the checksum) then only verified applications can ever be executed from the bootloader. Thanks — weblar83

recep recep · Accepted Answer · 2020-03-16T14:43:18

First I recommend that add some delay in your bootloader that waits for a firmware update process start indicator. I developed something similar; desktop application sends start byte periodically and when you connect your device, it enters bootloader mode and waits for five seconds more to get new firmware information; so it is not important whether there is valid main application on the flash or not. Another solution to check the existing of the main application use a specific sector of the flash for firmware information, before a firmware update process erase that sector. After a successful firmware update write a specific data to that sector. In the bootloader read this sector and verify that there is a valid application on the flash.

Bootloader Strategy for Corrupt Applications

2 Answers