8
votes

The GATT architecture of BLE lends itself to small fixed pieces of data (20 bytes max per characteristic). But in some cases, you end up wanting to “stream” some arbitrary length of data, that is greater than 20 bytes. For example, a firmware upgrade, even if you know its slow.

I’m curious what scheme others have used if any, to “stream” data (even if small and slow) over BLE characteristics.

I’ve used two different schemes to date:

One was to use a control characteristic, where the receiving device notify the sending device how much data it had received, and the sending device then used that to trigger the next write (I did both with_response, and without_response) on a different characteristic.

Another scheme I did recently, was to basically chunk the data into 19 byte segments, where the first byte indicates the number of packets to follow, when it hits 0, that clues the receiver that all of the recent updates can be concatenated and processed as a single packet.

The kind of answer I'm looking for, is an overview of how someone with experience has implemented a decent schema for doing this. And can justify why what they did is the best (or at least better) solution.

2

2 Answers

20
votes

After some review of existing protocols, I ended up designing a protocol for over-the-air update of my BLE peripherals.

Design assumptions

  1. we cannot predict stack behavior (protocol will be used with all our products, whatever the chip used and the vendor stack, either on peripheral side or on central side, potentially unknown yet),
  2. use standard GATT service,
  3. avoid L2CAP fragmentation,
  4. assume packets get queued before TX,
  5. assume there may be some dropped packets (even if stacks should not),
  6. avoid unnecessary packet round-trips,
  7. put code complexity on central side,
  8. assume 4.2 enhancements are unavailable.

1 implies 2-5, 6 is a performance requirement, 7 is optimization, 8 is portability.

Overall design

After discovery of service and reading a few read-only characteristics to check compatibility of device with image to be uploaded, all upload takes place between two characteristics:

  • payload (write only, without response),
  • status (notifiable).

The whole firmware image is sent in chunks through the payload characteristic.

Payload is a 20-byte characteristic: 4-byte chunk offset, plus 16-byte data chunk.

Status notifications tell whether there is an error condition or not, and next expected payload chunk offset. This way, uploader can tell whether it may go on speculatively, sending its chunks from its own offset, or if it should resume from offset found in status notification.

Status updates are sent for two main reasons:

  • when all goes well (payloads flying in, in order), at a given rate (like 4Hz, not on every packet),
  • on error (out of order, after some time without payload received, etc.), with the same given rate (not on every erroneous packet either).

Receiver expects all chunks in order, it does no reordering. If a chunk is out of order, it gets dropped, and an error status notification is pushed.

When a status comes in, it acknowledges all chunks with smaller offsets implicitly.

Lastly, there is a transmit window on the sender side, where many successful acknowledges flying allow sender to enlarge its window (send more chunks ahead of matching acknowledge). Window is reduced if errors happen, dropped chunks probably are because of a queue overflow somewhere.

Discussion

Using "one way" PDUs (write without response and notification) is to avoid 6. above, as ATT protocol explicitly tells acknowledged PDUs (write, indications) must not be pipelined (i.e. you may not send next PDU until you received response).

Status, containing the last received chunk, palliates 5.

To abide 2. and 3., payload is a 20-byte characteristic write. 4+16 has numerous advantages, one being the offset validation with a 16-byte chunk only involves shifts, another is that chunks are always page-aligned in target flash (better for 7.).

To cope with 4., more than one chunk is sent before receiving status update, speculating it will be correctly received.

This protocol has the following features:

  • it adapts to radio conditions,
  • it adapts to queues on sender side,
  • there is no status flooding from target,
  • queues are kept filled, this allows the whole central stack to use every possible TX opportunity.

Some parameters are out of this protocol:

  • central should enforce short connection interval (try to enforce it in the updater app);
  • slave PHY should be well-behaved with slave latency (YMMV, test your vendor's stack);
  • you should probably compress your payload to reduce transfer time.

Numbers

With:

  • 15% compression,
  • a device connected with connectionInterval = 10ms,
  • a master PHY limiting every connection event to 4-5 TX packets,
  • average radio conditions.

I get 3.8 packets per connection event on average, i.e. ~6 kB/s of useful payload after packet loss, protocol overhead, etc.

This way, upload of a 60 kB image is done in less than 10 seconds, the whole process (connection, discovery, transfer, image verification, decompression, flashing, reboot) under 20 seconds.

1
votes

It depends a bit on what kind of central device you have. Generally, Write Without Response is the way to stream data over BLE. Packets being received out-of-order should not happen since BLE's link layer never sends the next packet before it the previous one has been acknowledged.

For Android it's very easy: just use Write Without Response to send all packets, one after another. Once you get the onCharacteristicWrite you send the next packet. That way Android automatically queues up the packets and it also has its own mechanism for flow control. When all its buffers are filled up, the onCharacteristicWrite will be called when there is space again.

iOS is not that smart however. If you send a lot of Write Without Response packets and the internal buffers are full, iOS will silently drop new packets. There are two ways around this, either implement some (maybe complex) protocol for the peripheral notifying the status of the transmission, like Nipos answer. An easier way however is to send each 10th packet or so as a Write With Response, the rest as Write Without Response. That way iOS will queue up all packets for you and not drop the Write Without Response packets. The only downside is that the Write With Response packets require one round-trip. This scheme should nevertheless give you high throughput.