After some review of existing protocols, I ended up designing a protocol for over-the-air update of my BLE peripherals.
Design assumptions
- we cannot predict stack behavior (protocol will be used with all our products, whatever the chip used and the vendor stack, either on peripheral side or on central side, potentially unknown yet),
- use standard GATT service,
- avoid L2CAP fragmentation,
- assume packets get queued before TX,
- assume there may be some dropped packets (even if stacks should not),
- avoid unnecessary packet round-trips,
- put code complexity on central side,
- assume 4.2 enhancements are unavailable.
1 implies 2-5, 6 is a performance requirement, 7 is optimization, 8 is portability.
Overall design
After discovery of service and reading a few read-only characteristics to check compatibility of device with image to be uploaded, all upload takes place between two characteristics:
- payload (write only, without response),
- status (notifiable).
The whole firmware image is sent in chunks through the payload characteristic.
Payload is a 20-byte characteristic: 4-byte chunk offset, plus 16-byte data chunk.
Status notifications tell whether there is an error condition or not, and next expected payload chunk offset. This way, uploader can tell whether it may go on speculatively, sending its chunks from its own offset, or if it should resume from offset found in status notification.
Status updates are sent for two main reasons:
- when all goes well (payloads flying in, in order), at a given rate (like 4Hz, not on every packet),
- on error (out of order, after some time without payload received, etc.), with the same given rate (not on every erroneous packet either).
Receiver expects all chunks in order, it does no reordering. If a chunk is out of order, it gets dropped, and an error status notification is pushed.
When a status comes in, it acknowledges all chunks with smaller offsets implicitly.
Lastly, there is a transmit window on the sender side, where many successful acknowledges flying allow sender to enlarge its window (send more chunks ahead of matching acknowledge). Window is reduced if errors happen, dropped chunks probably are because of a queue overflow somewhere.
Discussion
Using "one way" PDUs (write without response and notification) is to avoid 6. above, as ATT protocol explicitly tells acknowledged PDUs (write, indications) must not be pipelined (i.e. you may not send next PDU until you received response).
Status, containing the last received chunk, palliates 5.
To abide 2. and 3., payload is a 20-byte characteristic write. 4+16 has numerous advantages, one being the offset validation with a 16-byte chunk only involves shifts, another is that chunks are always page-aligned in target flash (better for 7.).
To cope with 4., more than one chunk is sent before receiving status update, speculating it will be correctly received.
This protocol has the following features:
- it adapts to radio conditions,
- it adapts to queues on sender side,
- there is no status flooding from target,
- queues are kept filled, this allows the whole central stack to use every possible TX opportunity.
Some parameters are out of this protocol:
- central should enforce short connection interval (try to enforce it in the updater app);
- slave PHY should be well-behaved with slave latency (YMMV, test your vendor's stack);
- you should probably compress your payload to reduce transfer time.
Numbers
With:
- 15% compression,
- a device connected with connectionInterval = 10ms,
- a master PHY limiting every connection event to 4-5 TX packets,
- average radio conditions.
I get 3.8 packets per connection event on average, i.e. ~6 kB/s of useful payload after packet loss, protocol overhead, etc.
This way, upload of a 60 kB image is done in less than 10 seconds, the whole process (connection, discovery, transfer, image verification, decompression, flashing, reboot) under 20 seconds.