Why use output buffering in PHP?

Question

I have read quite a bit of material on Internet where different authors suggest using output buffering. The funny thing is that most authors argument for its use only because it allows for mixing response headers with actual content. Frankly, I think that responsible web applications should not mix outputting headers and content, and web developers should look for possible logical flaws in their scripts which result in headers being sent after output has been generated. This is my first argument against the ob_* output buffering API. Even for that little convenience you get - mixing headers with output - it is not a good enough reason to use it, unless one needs to hack up scripts fast, which is usually not the goal nor the way in a serious web application.

Also, I think most people dealing with the output buffering API do not think about the fact that even without the explicit output buffering enabled, PHP in combination with the web-server it is plugged into, still does some internal buffering anyway. It is easy to check - do an echo of some short string, sleep for say 10 seconds, and do another echo. Request your script with a browser and watch the blank page pause for 10 seconds, with both lines appearing thereafter. Before some say that it is a rendering artefact, not traffic, tracing the actual traffic between the client and the server shows that the server has generated the Content-Length header with an appropriate value for the entire output - suggesting that the output was not sent progressively with each echo call, but accumulated in some buffer and then sent on script termination. This is one of my gripes with explicit output buffering - why do we need two different output buffer implementations on top of one another? May it be because the internal (inaccessible) PHP/Web-server output buffering is subject to conditions a PHP developer cannot control, and is thus not really usable?

In any case, I for one, start to think one should avoid explicit output buffering (the series of ob_* functions) and rely on the implicit one, assisting it with the good flush function, when necessary. Maybe if there was some guarantee from the web server to actually send output to the client with each echo/print call, then it would be useful to set up explicit buffering - after all one does not want to send response to the client with some 100 byte chunks. But the alternative with two buffers seems like a somewhat useless layer of abstraction.

So, ultimately, do serious web applications need output buffering?

I'm so impressed that the first answers came in about 3 min. after the question was asked. That's some speedy reading! — troelskn
@Chacha102: and @troelskn: Wow, the Internet has really destroyed your ability to read, hasn't it? It's really not that much to read. And in my opinion, a "wall of text" doesn't feature such nice things as paragraph breaks. I hate to give you two (and the upvoters) a hard time, but we should be praising people who take the time to elaborate on their questions rather than mocking them. If your attention span is that short, why respond? — eyelidlessness
I thought Stack Overflow was for questions with answers, not debates...? — Martin Bean
I do write perhaps a bit too much, you are right. To my defense I would say I would rather over-explain myself in a single question than spam the question list with a question that is too vague and needs clarification. In any case, Chacha102, I am sorry you have wasted 2 minutes of your life. Better sense of judgement next time, after all no one asked you to read my wall of text. — amn
@eyelid, Given that I've read it twice might lend to the fact that I don't have a short attention span. I didn't mean the comment to be derogatory, but obviously you aren't able to detect sarcasm or humor on the internet. Maybe your ability to detect humor has been destroyed by the internet. — Tyler Carter

David Bullock David Bullock · Accepted Answer · 2013-07-05T06:09:44

Yes

Serious web applications need output buffering in one specific situation:

Your application wants control over what is output by some 3rd-party code, but there is no API to control what that code emits.

In that scenario, you can call ob_start() just before handing control to that code, mess around with what is written (ideally with the callback, or by examining the buffer contents if you must), and then calling ob_flush().

Ultimately, PHPs' ob_functions are a mechanism for capturing what some other bit of code does into a buffer you can mess with.

If you don't need to inspect or modify what is written to the buffer, there is nothing gained by using ob_start().

Quite likely, your 'serious application' is in fact a framework of some kind.

You already have output buffering, anyway

You don't need ob_start() in order to make use of output buffering. Your web-server already does buffer your output.

Using ob_start() does not get you better output buffering - it could in fact increase your application's memory usage and latency by 'hoarding' data which the web-server would otherwise have sent to the client already.

Maybe `ob_start()` ...

... for convenience when flushing

In some cases, you may want control over when the web-server flushes its buffer, based on some criteria which your application knows best. Most of the time, you know that you just finished writing a logical 'unit' which the client can make use of, and you're telling the web-server to flush now and not wait for the output buffer to fill up. To do this, it is simply necessary to emit your output as normal, and punctuate it with flush().

More rarely, you will want to withhold data from the web-server until you have enough data to send. No point interrupting the client with half of the news, especially if the rest of the news will take some time to become available. A simple ob_start later concluded by an ob_end_flush() may indeed be the simplest and appropriate thing to do.

... if you have responsibility for certain headers

If your application is taking responsibility for calculating headers which can only be determined after the full response is available, then it may be acceptable.

However, even here, if you can't do any better than deriving the header by inspecting the complete output buffer, you might as well let the web-server do it (if it will). The web-server's code, is written, tested, and compiled - you are unlikely to improve on it.

For example, it would only be useful to set the Content-Length header if your application knows the length of the response body after before it computes the response body.

No panacea for bad practices

You should not ob_start() to avoid the disciplines of:

opening, using and quickly closing resources such as memory, threads and database connections
emitting headers first, and the body second
doing all the calculations and error handling you can, before beginning the response

If you do these, they will cause technical debt which will make you cry one day.