I am working on implementing RTP on an embedded MCU (STM32F4) and I'm having trouble with efficiently streaming audio data (8 kHz, u-law encoded).
For chunking audio data (20ms, 160 bytes) should I:
- Send a single RTP packet (12 byte header + 160 bytes of audio) over a single UDP datagram, or
- Send multiple RTP packets over a single UDP datagram (since we can fit multiple within a single UDP MTU)
If (2), then should there be a RTP header for each 160 bytes of audio data within the single UDP datagram. For example, 5 RTP packets would be 800 bytes of audio data - would I send:
- RTP_Header->Audio_data(160 bytes)->RTP_Header->Audio_data(160 bytes)->RTP_Header->Audio_data(160 bytes)...
- RTP_Header->Audio_data (800 bytes)
Using LinPhone as a client for testing, I am noticing multiple Out Of Time Packets and a short delay from when I speak into my embedded device to when I hear it on Linphone; and I'm trying to track down if more efficiently streaming data over UDP will fix it. I do not have the same delay when speaking into LinPhone and playing out of my embedded device, and the delay between the two is proving difficult for echo cancellation on the embedded MCU.