8
votes

I'm currently trying to use Android as a Skype endpoint. At this stage, I need to encode video into H.264 (since it's the only format supported by Skype) and encapsulate it with RTP in order to make the streaming work.

Apparently the MediaRecorder is not very suited for this for various reasons. One is because it adds the MP4 or 3GP headers after it's finished. Another is because in order to reduce latency to a minimum, hardware accelaration may come in handy. That's why I would like to make use of the recent low-level additions to the framework, being MediaCodec, MediaExtractor, etc.

At the moment, I plan on working as follows. The camera writes its video into a buffer. The MediaCodec encodes the video with H264 and writes the result to another buffer. This buffer is read by an RTP-encapsulator, which sends the stream data to the server. Here's my first question: does this plan sounds feasible to you?

Now I'm already stuck with step one. Since all documentation on the internet about using the camera makes use of MediaRecorder, I cannot find a way to store its raw data into a buffer before encoding. Is addCallbackBuffer suited for this? Anyone has a link with an example?

Next, I cannot find a lot of documentation about MediaCodec (since it's fairly new). Anyone who has a solid tutorial?

Lastly: any recommendations on RTP libraries?

Thanks a lot in advance!

3

3 Answers

9
votes

UPDATE
I was finally able to create proper RTP packages from the h264 frames. Here's what you have to keep in mind (it's actually quite simple):

The encoder does create NAL headers for each frame. But it returns each frame as a h264 bytestream. This means that each frame starts with three 0-bytes and a 1-byte. All you have to do is remove those start prefixes, and put the frame into a RTP packet (or split them up using FU-As).

Now to your questions:

I cannot find a way to store its raw data into a buffer before encoding. Is addCallbackBuffer suited for this?

You should use camera.setPreviewCallback(...), and add each frame to the encoder.

I cannot find a lot of documentation about MediaCodec (since it's fairly new). Anyone who has a solid tutorial?

This should be a good introduction as to how the MediaCodec works. http://dpsm.wordpress.com/2012/07/28/android-mediacodec-decoded/

Lastly: any recommendations on RTP libraries?

I'm using jlibrtp which gets the job done.

6
votes

I don't know anything about MediaCodec or MediaExtractor yet, but I am fairly familiar with MediaRecorder and have successfully implemented an RTSP server, based on SpyDroid, that captures H264/AMRNB output from MediaRecorder. The basic idea is that the code creates a local socket pair and uses setOutputFile of the MediaRecorder to write output to one of the sockets in the pair. Then, the program reads the video or audio stream from the other socket, parses it into packets, and then wraps each packet into one or more RTP packets which are sent over UDP.

It's true that MediaRecorder adds the MOOV headers after it's finished, but that's not a problem if you're serving H264 video in RTP format. Basically, there's an "mdat" header at the start of the video stream. It has 4 bytes for the length of the header, followed by the 4 bytes "mdat". Read the length to find out how long the header is, verify that it's the mdat header, and then skip the rest of the header data. From there on, you get a stream of NAL units, which start with 4 bytes for the unit length. Small NAL units can be sent in a single RTP packet, and larger units get broken up into FU packets. For RTSP, you also need to serve an SDP header that describes the stream. SpyDroid calculates the info in the SDP header by writing a very short movie to file, and then reads this file to extract the MOOV header from the end. My app always uses the same size, format, and bit rate, so I just serve a static string:

public static final String SDP_STRING =
        "m=video 5006 RTP/AVP 96\n"
                + "b=RR:0\n"
                + "a=rtpmap:96 H264/90000\n"
                + "a=fmtp:96 packetization-mode=1;profile-level-id=428028;sprop-parameter-sets=Z0KAKJWgKA9E,aM48gA==;\n"
                + "a=control:trackID=0\n"
                + "m=audio 5004 RTP/AVP 96\n"
                + "b=AS:128\n"
                + "b=RR:0\n"
                + "a=rtpmap:96 AMR/8000\n"
                + "a=fmtp:96 octet-align=1;\n"
                + "a=control:trackID=1\n";

That's my header for 640x480x10fps, H264 video, with 8000/16/1 AMRNB audio.

One thing I can warn you about: If you're using MediaRecorder, your preview callback will never get called. That only works in camera mode, not when you're recording video. I haven't been able to find any way of getting access to the preview image in uncompressed format while the video is recording.

I highly recommend looking over the code for SpyDroid. It takes some digging around, but I bet what you want is in there already.

0
votes

What you plan is definetly feasible. You can register a Camera.PreviewCallback which takes the picture data and puts it into the MediaCodec. You read the output and send it as RTP. In general it's easy, but there are various pitfalls as undocumented color spaces and different MediaCodec behaviour on different devices, but it's definetly possible.