How to convert rtp packet payload bytes to any audio data?

Question

I am making a project in java without using any third party libraries. I have successfully established an udp connection using the DatagramSocket. Then I started communicating using the sip protocol. I have successfully passed the registration and invitation stage. This is how I got the host and port to which the audio data stream will be transmitted. Then I successfully established a connection to the new address using DatagramSocket. And i began to receive data in the form of rtp packets. I managed to successfully get the following data from the package: Payload type (in my case 8 or PCMA), Timestamp, Sequence number and payload data (byte array). Now I want to process the received data so that I can use it in the future. That is, save to disk, convert to any other audio format at will, play audio, and so on. I can't figure out what exactly needs to be done with the byte array received from the packet.

Let's say for a start I want to save the received data to a file in the AudioFormat.Encoding.PCM_FLOAT 8000.0 Hz, 8 bit, mono, 160 bytes / frame format. What do I need to do for this?

You need to extract the RTP payload and apply an ALAW decoder (which is a simple lookup table) to get the PCM bytes. Here's a C# version that I imagine you can easily use with Java, github.com/sipsorcery/sipsorcery/blob/master/src/app/Media/…. — sipsorcery
I am trying to take advantage of the example you suggested. Found the code where this table is used. github.com/sipsorcery/sipsorcery/blob/master/src/app/Media/… I cannot figure out how to correctly map the bytes from the array received in the packet to this table. In the example, I see that the byte value is used as the array index. But in my case, I get positive and negative numbers (for example -43) in payload. Tell me, did I read the payload incorrectly or do I need to process each byte somehow before matching it with the table element? — Stanley Wintergreen
The index to the ALAW table is a byte which is not signed. You get back a signed 16 bit PCM sample which is what a lot of sound API's expect to drive a speaker. In other words you supply a one byte ALAW encoded sample and get back a two byte PCM sample. Typically the two byte output is treated as signed 16 bit (short) but sometimes it needs to be converted to a float between 0.0 and 1.0. It all depends on the sound API you want to feed the PCM samples to. — sipsorcery

Stanley Wintergreen Stanley Wintergreen · Accepted Answer · 2020-10-02T12:57:46

Using the example by Aaron Clauson I solved my problem. The byte array from the packet must be converted according to the proposed scheme. I made a working example of how this might look in Kotlin.

An example of how to get data from a raw rtp package.

If you glue the resulting arrays in the correct order (by sequenceNumber) and no duplicates, then you can, for example, write this to a WAV file using javax.sound.sampled.AudioInputStream, as in the example.

A simplified example of how you can play a stream of sound in real time on an Android. And a simplified example of how you can send sound from a microphone to a server.

How to convert rtp packet payload bytes to any audio data?

1 Answers