87
votes

WebSockets have the option of sending pings to the other end, where the other end is supposed to respond with a pong.

Upon receipt of a Ping frame, an endpoint MUST send a Pong frame in response, unless it already received a Close frame. It SHOULD respond with Pong frame as soon as is practical.

TCP offers something similar in the form of keepalive:

[Y]ou send your peer a keepalive probe packet with no data in it and the ACK flag turned on. You can do this because of the TCP/IP specifications, as a sort of duplicate ACK, and the remote endpoint will have no arguments, as TCP is a stream-oriented protocol. On the other hand, you will receive a reply from the remote host (which doesn't need to support keepalive at all, just TCP/IP), with no data and the ACK set.

I would think that TCP keepalive is more efficient, because it can be handled within the kernel without the need to transfer data up to user space, parse a websocket frame, craft a response frame, and hand that back to the kernel for transmission. It's also less network traffic.

Furthermore, WebSockets are explicitly specified to always run over TCP; they're not transport-layer agnostic, so TCP keepalive is always available:

The WebSocket Protocol is an independent TCP-based protocol.

So why would one ever want to use WebSocket ping/pong instead of TCP keepalive?

4
Actually one never uses WebSocket ping/pong because no API was created. And one never uses TCP keepalive either, for the reasons noted in the answers. This is a great example of how layering introduces complexity without solving problems: every layer has to implement the same feature, but each is useless for its own reason. So the application still has to implement its own keepalive on top of all the other layers.personal_cloud

4 Answers

89
votes

The problems with TCP keepalive are:

  1. It is off by default.
  2. It operates at two-hour intervals by default, instead of on-demand as the Ping/Pong protocol provides.
  3. It operates between proxies rather than end to end.
  4. As pointed out by @DavidSchwartz, it operates between TCP stacks, not between the applications so therefore it doesn't tell us whether the application is alive or not.

The comparison with WebSockets ping/pong isn't meaningful. TCP keepalive is automatic, and timed, when enabled, whereas WebSocket ping/pong is executed as required by the application.

46
votes

Besides the answer of EJP I think it might be also related to HTTP proxy mechanisms. Websocket connections can also run through a (HTTP) proxy server. In such cases the TCP keepalive would only check the connection up to the proxy and not the end-to-end connection.

20
votes

http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#ping-and-pong-frames

.3.4 Ping and Pong frames

The WebSocket protocol specification defines Ping and Pong frames that can be used for keep-alive, heart-beats, network status probing, latency instrumentation, and so forth. These are not currently exposed in the API.

User agents may send ping and unsolicited pong frames as desired, for example in an attempt to maintain local network NAT mappings, to detect failed connections, or to display latency metrics to the user. User agents must not use pings or unsolicited pongs to aid the server; it is assumed that servers will solicit pongs whenever appropriate for the server's needs.

WebSockets have been developed with RTC in mind, so when I look at the ping/pong functionality, I see a way of measuring latency as well. The fact that the pong must return the same payload as the ping, make it very convenient to send a timestamp, and then calculate latency from client to server or vice verse.

18
votes

TCP keepalive doesn't get passed through a web proxy. The websocket ping/pong will be forwarded by through web proxies. TCP keepalive is designed to supervise a connection between TCP endpoints. Web socket endpoints are not equal to TCP endpoints. A websocket connection can use several TCP connections between two websocket endpoints.