6
votes

Reading about Webrtc i get the feeling that "it will drop server bandwidth usage dramatically" except for "a few corner enterprise-firewall cases" where one needs a TURN server which relays the whole traffic between the peers.

For example, altough not webrtc related but the idea is similar, the wikipedia article of Chatroulette states: The website uses Adobe Flash to display video and access the user's webcam. Flash's peer-to-peer network capabilities (via RTMFP) allow almost all video and audio streams to travel directly between user computers, without using server bandwidth. However, certain combinations of routers will not allow UDP traffic to flow between them, and then it is necessary to fall back to RTMP.

Also similar articles on Webrtc focus on "yeah there might be problems with firewalls so you need a TURN server but ignore this and look at my awesome PeerConnection javascript code".

What i don't understand:

A Connection between two peers requires a server socket to be open so the peers can connect to it. Even UDP requires the concept of a udp server socket. Since nearly all not-server internet connected peers are behind some kind of router. E.g. every smartphone uses a wifi router, desktop PC's use the router of the service provider, ... It shouldnt be possible to connect to a server socket hosted on a smartphone (browser webrtc server socket) or desktop cause of the router/firewall.

Thus my understanding is practically no two peers which need to send their traffic through the internet will be able to use a direct P2P connection, right? So the only useful case to use Webrtc is in a LAN like environment, right? Furtherly in case of a video chat service like chatroulette based on webrtc would need to use a bunch of TURN servers to relay nearly ALL traffic. Which makes Webrtc equally costly regarding server bandwidth like hosting my own solution.

So my question is: Am i right? If not what is the technical detail that allows a PeerConnection to be used without a TURN server but for two nodes separated by the Internet? How is the connection established on Layer 4 the TCP/UDP Transport Layer? Is it using UDP and all wifi routers allow hosting UDP server sockets or such? Which wouldnt make much sense cause of NAT and security.

UPDATE 1: Digging a bit further i found what "symmetric nat" means and what it has to do with enterprises: In most enterprises it seems that the device connected to the internet has symmetric nat implemented. This means that the routing table which maps internal "internal-ip:internal-port" tuples to "internet-ip:internet-port" also stores "destination-ip:destination-port". So such routes/nats store a table for every (tcp?) connection having 6 columns "internal-ip:internal-port:internet-ip:internet-port:destination-ip:destination-port". This means no one else but the destination is allowed to communicate with internal-ip:internal-port. Whereas non-enterprise-routers seem to only store the "internal-ip:internal-port:internet-ip:internet-port" combination. Thats also what is meant as "poke a hole in the firewall".

1

1 Answers

8
votes

You're not right. All peers have IP addresses in order to communicate, and can be reached on those same addresses, provided a firewall allows it.

NATs tend to be optimized for client-initiated client-server traffic only. That typically means they initially allow outbound traffic only, and only allow inbound traffic on the same line after outbound traffic has happened. Perfect for servers. See this WebRTCHacks article for an intro to the problem.

This is where ICE comes in to attempt to poke holes in the firewall from the inside (client-side), in order to establish a line of communication directly between two peers, without needing any "server" socket, whatever that means.

How ICE works is quite complicated, and is explained in detail in the RFC.

But in broad terms it works in a number of steps:

  1. Each peer (e.g. browser) has an "ICE agent" that collects candidates. Candidates are addresses (IP:port numbers) at which this peer can be reached, including:
    1. Host candidates: e.g. immediate LAN/wifi/VPN IPs of the machine.
    2. Server-reflexive candidates: public (outside-NAT) addresses of the machine, obtained by bouncing requests off mirroring (STUN) servers on the internet.
    3. Relay candidates: addresses to a shared TURN server to forward data if all else fails.
  2. Once discovered, candidates are inserted into the local SDP, and trickled over the signaling channel to the other peer, where they are inserted in it's remote description, where the other agent sees them.
  3. Once an ICE agent has both local and remote candidates, it starts pairing local and remote candidates, and checks them for connectivity by sending STUN requests on them (effectively attempts at reaching the peer).
  4. Successful pairs are ones both ICE agents have gotten a response back on (a 4-way handshake if you will).
  5. If there's more than one successful pair, they're sorted by some metric, and the best pair becomes selected.
  6. The selected pair is then used to send media over. One pair is needed for each track of (video or audio) media.
  7. If a better pair is found later, the selected pair may change, affecting what address media is sent on.

TURN should only be needed in cases either where both clients are behind symmetrical NATs, or UDP traffic is blocked entirely.