34
votes

Yesterday, I have a discussion with my colleagues about HTTP. It is asked why HTTP is designed in plain text way. Surely, it can be designed in binary way just like TCP protocol, using flags to represents different kinds of method(POST, GET) and variables (HTTP headers). So, why HTTP is designed in such way? Is there any technical or historical reasons?

10

10 Answers

58
votes

A reason that's both technical and historical is that text protocols are almost always preferred in the Unix world.

Well, this is not really a reason but a pattern. The rationale behind this is that text protocols allows you to see what's going on on the network by just dumping everything that goes through. You don't need a specialized analyzer as you need for TCP/IP. This makes it easier to debug and easier to maintain.

Not only HTTP, but many protocols are text based (e.g., FTP, POP3, SMTP, IMAP).

You might want to take a look at The Art of Unix Programming for a much more detailed explanation of this Unix thing.

19
votes

With HTTP, the content of a request is almost always orders of magnitude larger than the protocol overhead. Converting the protocol into a binary one would save very little bandwidth, and the easy debugability that a text protocol offers easily trumps the minor bandwidth savings of a binary protocol.

9
votes

Many Internet application protocols use more or less plain text for the protocol (see FTP, POP, SMTP, etc.).

It makes interoperability and troubleshooting much easier.

8
votes

HTTP stands for "Hypertext Transfer Protocol".

It was initially devised as a way to serve text documents, hence the text based protocol.

What we do with HTTP now is far beyond its original intent.

5
votes

As with RFC 2616 section 3.7.1 for HTTP 1.1, the key identifier to a line of command or header is the text line-break CRLF; text-based application protocols makes it easier to carry out a conversation (for troubleshooting) purely with a Telnet client. It also makes it easier to program with ReadLine() calls and matching text strings.

The CRLF parameter break also gives near-unlimited abitrary header extensions unlike a fixed-size TCP or IP headers where one hard-codes by bit offsets.

3
votes

So it's easier to "read" the traffic or create a client or server?

You can debate whether it actually makes it easier, but surely that was the intent.

3
votes

In the case of http ,some people work on a "binary" version of it, they called it Embedded Binary HTTP (EBHTTP)

https://datatracker.ietf.org/doc/html/draft-tolle-core-ebhttp-00

1
votes

Historically, it all starts from RFC822 (STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES), whose latest version is RFC5322 (Internet Message Format). SMTP (RFC 821) was one of the most popular protocol based on RFC822. And, HTTP was born out of SMTP (your mail protocol).

1
votes

I like the:

...preferred in the Unix world.

reason, but it doesn't go into any explanation for why.

In order to understand why you need to place yourself into the shoes of a designer that wants to make a usable product.

A) You can document the shit out of meaningless gibberish (binary).

B) Develop or hope others develop tools that portray your meaningless gibberish in a meaningful way.

or

A) You can document the shit out of meaningful text that takes advantage of language as a tool for a self-documenting protocol.

B) There is no immediate need for additional tools, and additional tools will be much easier to write and debug.

It creates staged delivery and creates something that is easier to comprehend & recall when doing future development. It also creates a situation where a higher level abstraction is no longer necessary.

Imagine a world where setting a header value isn't as simple as dictionary/Map somewhere in your framework. When running into errors you'd have to constantly question whether or not your framework is correct or not, because you couldn't easily see it's doing the right thing without additional tools. That would be the world of HTTP if each framework needed to invent/implement it's own higher level abstraction (browsers come to mind).

Many protocol designer's want efficiency, this design focuses on usability, which is paramount in the software development industry. Unusable tools that are prematurely optimized create an unnecessary burden for software developers, and this burden manifests across the board.

0
votes

Now,HTTP/2 based Binary,it is much less error-prone.

https://http2.github.io/faq/#why-is-http2-binary