I have the pcap of a VoIP call involving SILK. I'm able to see in Wireshark the RTP payload. From the RTP headers I can understand the sample rate (e.g. 24 KHz) and the frame size (e.g. 20 ms). What I'd like to do is extract the RTP payload and generate a file containing the SILK-encoded audio. From the RTP payload format description I can see that in the case of storage in a file, each block of audio needs a block header, to specify the sample rate and block size (because the block size is variable and can be different on each frame).
How can I generate a file with the correct file header ("magic number") and add a block header for each block of audio?
I can use a few different programming languages so I'm mainly interested in the required algorithm, but would appreciate references to code implementations (or perhaps some existing tool?).