105
votes

What is the current state of affairs when it comes to whether to do

Transfer-Encoding: gzip

or a

Content-Encoding: gzip

when I want to allow clients with e.g. limited bandwidth to signal their willingness to accept a compressed response and the server have the final say whether or not to compress.

The latter is what e.g. Apache's mod_deflate and IIS do, if you let it take care of compression. Depending on the size of the content to be compressed, it will do the additional Transfer-Encoding: chunked.

It will also include a Vary: Accept-Encoding, which already hints at the problem. Content-Encoding seems to be part of the entity, so changing the Content-Encoding amounts to a change of the entity, i.e. a different Accept-Encoding header means e.g. a cache cannot use its cached version of the otherwise identical entity.

Is there a definite answer on this that I have missed (and that's not buried inside a message in a long thread in some apache newsgroup)?

My current impression is:

  • Transfer-Encoding would in fact be the right way to do what is mostly done with Content-Encoding by existing server and client implentations
  • Content-Encoding, because of its semantic implications, carries a couple of issues (what should the server do to the ETag when it transparently compresses a response?)
  • The reason is chicken'n'egg: Browsers don't support it because servers don't because browsers don't

So I am assuming the right way would be a Transfer-Encoding: gzip (or, if I additionally chunk the body, it would become Transfer-Encoding: gzip, chunked). And no reason to touch Vary or ETag or any other header in that case as it's a transport-level thing.

For now I don't care too much about the 'hop-by-hop'-ness of Transfer-Encoding, something that others seem to be concerned about first and foremost, because proxies might uncompress and forward uncompressed to the client. However, proxies might just as well forward it as-is (compressed), if the original request has the proper Accept-Encoding header, which in case of all browsers that I know is a given.

Btw, this issue is at least a decade old, see e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=68517 .

Any clarification on this will be appreciated. Both in terms of what is considered standards-compliant and what is considered practical. For example, HTTP client libraries only supporting transparent "Content-Encoding" would be an argument against practicality.

2
Just ran into this. Curl on PHP 5.3 doesn't understand Transfer-Encoding:gzip, although command line curl does. To be on the safe side, send both, unless you're combining chunked and gzip.Seva Alekseyev
@SevaAlekseyev sending both would be very wrong -- clients might try to decompress twiceJoshua Wise
This is something that's bugged me forever, too (question I asked)… per one of the answers to the question that @JoLiss cited, there's a perfectly logical, semantically coherent, and standards-compliant way to compress request/response bodies… and basically no clients/servers use or support it. 🤦🏻‍Dan Lenski

2 Answers

36
votes

Quoting Roy T. Fielding, one of the authors of RFC 2616:

changing content-encoding on the fly in an inconsistent manner (neither "never" nor "always) makes it impossible for later requests regarding that content (e.g., PUT or conditional GET) to be handled correctly. This is, of course, why performing on-the-fly content-encoding is a stupid idea, and why I added Transfer-Encoding to HTTP as the proper way to do on-the-fly encoding without changing the resource.

Source: https://issues.apache.org/bugzilla/show_bug.cgi?id=39727#c31

In other words: Don't do on-the-fly Content-Encoding, use Transfer-Encoding instead!

Edit: That is, unless you want to serve gzipped content to clients that only understand Content-Encoding. Which, unfortunately, seems to be most of them. But be aware that you leave the realms of the spec and might run into issues such as the one mentioned by Fielding as well as others, e.g. when caching proxies are involved.

33
votes

The correct usage, as defined in RFC 2616 and actually implemented in the wild, is for the client to send an Accept-Encoding request header (the client may specify multiple encodings). The server may then, and only then, encode the response according to the client's supported encodings (if the file data is not already stored in that encoding), indicate in the Content-Encoding response header which encoding is being used. The client can then read data off of the socket based on the Transfer-Encoding (ie, chunked) and then decode it based on the Content-Encoding (ie: gzip).

So, in your case, the client would send an Accept-Encoding: gzip request header, and then the server may decide to compress (if not already) and send a Content-Encoding: gzip and optionally Transfer-Encoding: chunked response header.

And yes, the Transfer-Encoding header can be used in requests, but only for HTTP 1.1, which requires that both client and server implementations support the chunked encoding in both directions.

ETag uniquely identifies the resource data on the server, not the data actually being transmitted. If a given URL resource changes its ETag value, it means the server-side data for that resource has changed.