Schemes for streaming data with BLE GATT characteristics

Question

The GATT architecture of BLE lends itself to small fixed pieces of data (20 bytes max per characteristic). But in some cases, you end up wanting to “stream” some arbitrary length of data, that is greater than 20 bytes. For example, a firmware upgrade, even if you know its slow.

I’m curious what scheme others have used if any, to “stream” data (even if small and slow) over BLE characteristics.

I’ve used two different schemes to date:

One was to use a control characteristic, where the receiving device notify the sending device how much data it had received, and the sending device then used that to trigger the next write (I did both with_response, and without_response) on a different characteristic.

Another scheme I did recently, was to basically chunk the data into 19 byte segments, where the first byte indicates the number of packets to follow, when it hits 0, that clues the receiver that all of the recent updates can be concatenated and processed as a single packet.

The kind of answer I'm looking for, is an overview of how someone with experience has implemented a decent schema for doing this. And can justify why what they did is the best (or at least better) solution.

Nipo Nipo · Accepted Answer · 2016-05-14T14:29:35

After some review of existing protocols, I ended up designing a protocol for over-the-air update of my BLE peripherals.

Design assumptions

we cannot predict stack behavior (protocol will be used with all our products, whatever the chip used and the vendor stack, either on peripheral side or on central side, potentially unknown yet),
use standard GATT service,
avoid L2CAP fragmentation,
assume packets get queued before TX,
assume there may be some dropped packets (even if stacks should not),
avoid unnecessary packet round-trips,
put code complexity on central side,
assume 4.2 enhancements are unavailable.

1 implies 2-5, 6 is a performance requirement, 7 is optimization, 8 is portability.

Overall design

After discovery of service and reading a few read-only characteristics to check compatibility of device with image to be uploaded, all upload takes place between two characteristics:

payload (write only, without response),
status (notifiable).

The whole firmware image is sent in chunks through the payload characteristic.

Payload is a 20-byte characteristic: 4-byte chunk offset, plus 16-byte data chunk.

Status notifications tell whether there is an error condition or not, and next expected payload chunk offset. This way, uploader can tell whether it may go on speculatively, sending its chunks from its own offset, or if it should resume from offset found in status notification.

Status updates are sent for two main reasons:

when all goes well (payloads flying in, in order), at a given rate (like 4Hz, not on every packet),
on error (out of order, after some time without payload received, etc.), with the same given rate (not on every erroneous packet either).

Receiver expects all chunks in order, it does no reordering. If a chunk is out of order, it gets dropped, and an error status notification is pushed.

When a status comes in, it acknowledges all chunks with smaller offsets implicitly.

Lastly, there is a transmit window on the sender side, where many successful acknowledges flying allow sender to enlarge its window (send more chunks ahead of matching acknowledge). Window is reduced if errors happen, dropped chunks probably are because of a queue overflow somewhere.

Discussion

Using "one way" PDUs (write without response and notification) is to avoid 6. above, as ATT protocol explicitly tells acknowledged PDUs (write, indications) must not be pipelined (i.e. you may not send next PDU until you received response).

Status, containing the last received chunk, palliates 5.

To abide 2. and 3., payload is a 20-byte characteristic write. 4+16 has numerous advantages, one being the offset validation with a 16-byte chunk only involves shifts, another is that chunks are always page-aligned in target flash (better for 7.).

To cope with 4., more than one chunk is sent before receiving status update, speculating it will be correctly received.

This protocol has the following features:

it adapts to radio conditions,
it adapts to queues on sender side,
there is no status flooding from target,
queues are kept filled, this allows the whole central stack to use every possible TX opportunity.

Some parameters are out of this protocol:

central should enforce short connection interval (try to enforce it in the updater app);
slave PHY should be well-behaved with slave latency (YMMV, test your vendor's stack);
you should probably compress your payload to reduce transfer time.

Numbers

With:

15% compression,
a device connected with connectionInterval = 10ms,
a master PHY limiting every connection event to 4-5 TX packets,
average radio conditions.

I get 3.8 packets per connection event on average, i.e. ~6 kB/s of useful payload after packet loss, protocol overhead, etc.

This way, upload of a 60 kB image is done in less than 10 seconds, the whole process (connection, discovery, transfer, image verification, decompression, flashing, reboot) under 20 seconds.