Synchronization of data with video using WebRTC

Question

I'm using WebRTC to send video from a server to client browser (using the native WebRTC API and an MCU WebRTC server like Kurento).

Before sending it to clients each frame of the video contained metadata (like subtitles or any other applicative content). I'm looking for a way to send this metadata to the client such that it remains synchronized (to the time it is actually presented). In addition I would like to be able to access this data from the client side (by Javascript).

Some options I thought about:

Sending the data by WebRTC DataChannel. But I don't know how to ensure the data is synchronized on a per-frame basis. But I couldn't find a way to ensure the data sent by the data channel and the video channel is synchronize (again, I hope to get precision level of single frame).
Sending the data manually to the client in some way (WebRTC DataChannel, websockets, etc.) with timestamps that match the video's timestamps. However, even if Kurento or other middle servers preserve the timestamp information in the video, according to the following answer there is no applicative way to get the video timestamps from the javascript: How can use the webRTC Javascript API to access the outgoing audio RTP timestamp at the sender and the incoming audio RTP timestamp at the receiver?. I thought about using the standard video element's timeupdate event, but I don't konw if it will work for precision level of frame, and I'm not sure what it means in a live video as in WebRTC.
Sending the data manually and attach it to the video applicatively as another TextTrack. Then use the onenter and onexit to read it synchronizely: http://www.html5rocks.com/en/tutorials/track/basics/. It still requires precise timestamps, and I'm not sure how to know what are the timestamps and if Kurento pass them as-is.
Using the statistics API of WebRTC to manually count frames (using getstats), and hope that the information provided by this API is precise.

What is the best way to do that, and how to solve the problems I mentioned in either way?

EDIT: Precise synchronization (in resolution of no more than a single frame ) of metadata with the appropriate frame is required.

You will never get perfect synchronized streams if you separate them. You could implement a buffering systems to ensure no forward progress until there is an acceptable buffer available in both streams. Your best bet is to forget the perfect frame to frame match, if you want that then encode it into the video stream as video on the fly. Apart from audio and graphics, i can not think why you would need such a high precision. One you forget the perfect timing things get a lot simpler. — Blindman67
Thanks, good point. Anyway the question is about how to do that programmatically, assuming that I could ensure that the metadata stream has been already reached to the browser before the video stream. Your suggestion to re-encode the video sounds nice, but I still need to match the times of the video stream and metadata stream - I'm even not sure that the middle server preserves the presentation timestamps. — MaMazav
Media streams provide some help. If you are using HTML5 video you can use buffered to return a TimeRanges object to let you know what has been buffered. The HTMLMediaElement interface provides currentTime as a read write attribute. You can use it to get the time in seconds of the video. To get the current frame number frameNumber = Math.floor(videoElement.currentTime / frameRate); Writing to currentTime will cause the video to seek to that time. — Blindman67

Erik Alsmyr Erik Alsmyr · Accepted Answer · 2016-02-23T08:17:20

I suspect the amount of data per frame is fairly small. I would look at encoding it into a 2D barcode image and place it in each frame in a way so it is not removed by compression. Alternatively just encode timestamp like this.

Then on the player side you look at the image in a particular frame and get the data out or if it.

Synchronization of data with video using WebRTC

2 Answers