Media Source Extension Javascript API vis-a-vis WebRTC. Some questions

Question

The closest I came across this is this question on SO but that is just for basic understanding. My question is: when Media Source Extension (MSE) is used where the media source is fetched from a remote end point, for example, through AJAX or fetch API or even websocket, the media is sent over TCP.

That will handle packet loss and sequencing so protocol like RTP with RTCP is not used. Is that correct?
But this will result in delay so it cannot be truly used for real-time communication. Yes?
There is no security/encryption requirement for MSE like in WebRTC (DTLS/SRTP). Yes?
One cannot, for example, mix a remote audio source from MSE with an audio mediaStreamTrack from a RTCPeerConnection as they do not have any common param like CNAME (RTCP) or are part of the same mediastream). In other words, the world of MSE and WebRTC cannot mix unless synchronization is not important. Correct?

Brad Brad · Accepted Answer · 2020-04-09T16:34:55

That will handle packet loss and sequencing so protocol like RTP with RTCP is not used. Is that correct?

AJAX and Fetch are just JavaScript APIs for making HTTP requests. Web Socket is just an API and protocol extended from an initial HTTP request. HTTP uses TCP. TCP takes care of ensuring packets arrive and arrive in-order. So, yes, you won't need to worry about packet loss and such, but not because of MSE.

But this will result in delay so it cannot be truly used for real-time communication. Yes?

That depends entirely on your goals. It's a myth that TCP isn't fast, or that TCP increases general latency for every packet. What is true is that the initial 3-way handshake takes a few round trips. It's also true that if a packet does actually get dropped, the application sees latency as suddenly sharply increased until the packet is requested again and sent again.

If your goals are something like a telephony application where the loss of a packet or two is meaningless overall, then UDP is more appropriate. (In voice communications, we talk slow enough that if a few milliseconds of sound go missing, we can still decipher what was being said. Our spoken language is robust enough that if entire words get garbled or are silent, we can figure out the gist of what was being said from context.) It's also important that immediate continuity be kept for voice communications. The tradeoff is that realtime-ness is better than accuracy at any particular instant/packet.

However, if you're doing something, say a one-way stream, you might choose a protocol over TCP. In this case, it may be important to be as realtime as possible, but more important that the audio/video don't glitch out. Consider the Super Bowl, or some other large sporting event. It's a live event and important that it stays realtime. However, if the time reference for the viewer is only 3-5 seconds delayed from live, it's still "live" enough for the viewer. The viewer would be far more angry if the video glitched out and they missed something happening in the game, rather than if they were just behind a few seconds. Since it's one-way streaming and there is no communication feedback loop, the tradeoff for reliability and quality over extreme low latency makes sense.

There is no security/encryption requirement for MSE like in WebRTC (DTLS/SRTP). Yes?

MSE doesn't know or care how you get your data.

One cannot, for example, mix a remote audio source from MSE with an audio mediaStreamTrack from a RTCPeerConnection as they do not have any common param like CNAME (RTCP) or are part of the same mediastream). In other words, the world of MSE and WebRTC cannot mix unless synchronization is not important. Correct?

Mix, where? Synchronization, where? No matter what you do, if you have streams coming from different places... or even different devices without sync/gen lock, they're out of sync. However, if you can define a point of reference where you consider things "synchronized", then it's all good. You could, for example, have independent streams going into a server and the server uses its current timestamps to set everything up and distribute together via WebRTC.

How you do this, or what you do, depends on the specifics of your application.

Media Source Extension Javascript API vis-a-vis WebRTC. Some questions

1 Answers