5
votes

I've been struggling with an application I'm writing and I think I'm beginning to see that my problem is premature optimization. The perfectionist side of me wants to make everything optimal and perfect the first time through, but I'm finding this is complicating the design quite a bit. Instead of writing small, testable functions that do one simple thing well, I'm leaning towards cramming in as much functionality as possible in order to be more efficient.

For example, I'm avoiding multiple trips to the database for the same piece of information at the cost of my code becoming more complex. One part of me wants to just not worry about redundant database calls. It would make it easier to write correct code and the amount of data being fetched is small anyway. The other part of me feels very dirty and unclean doing this. :-)

I'm leaning towards just going to the database multiple times, which I think is the right move here. It's more important that I finish the project and I feel like I'm getting hung up because of optimizations like this. My question is: is this the right strategy to be using when avoiding premature optimization?

3
There's been quite a lot of discussion about this before: stackoverflow.com/search?q=premature+optimizationAlex Korban
If you tell us what language and database you're using, people might be able to give illustrative examples. I realise your question is general, but examples often help.detly
@detly: It's a web app backed by php/mysql.Ed Mazur

3 Answers

20
votes

This is the right strategy in general. Get the code to work, thoroughly covered with automated tests.

You can then run the automated tests while the program is under control of a profiler, to find out where the program is spending time and/or memory. That will show you where to optimize.

And it will be showing you how to optimize working code, not code that may or not work when it's all put together.

You don't want code that fails optimally.


The quote I was failing to remember is from Mich Ravera:

If it doesn't work, it doesn't matter how fast it doesn't work.

5
votes

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- Hoare

While @John Saunders nails it, applying TDD alone might not completely addresses your concerns. I adhere to TDD, and when you do TDD correctly, and if you can apply refactoring effectively, you typically end up with much leaner code, and with the benefit that you know it works. No arguments there.

However, I see too many developers write performance-ignorant code - avoiding premature optimization is no excuse for writing sloppy / lazy / naive code. Writing unit tests doesn't prevent this. Although someone who writes unit tests is probably a better coder, and better coders are less apt to write bad code as often.

Do write tests, and do include performance testing into your suite of tests for the scenarios your stakeholders identify. e.g. retrieve 100 discounted products for a specific vendor, and include stocking levels and format as Xml in under 3 seconds

The fallacy that "premature optimization" is the same thing as "concern about performance" should not guide software development. -- Randall Hyde

If you leave performance concerns too late, you may find it's too hard, or too costly to change.

Some articles

1
votes

The key aspect of Knuth's quote to me is "penny-wise and pound-foolish". That's how he ultimately described the premature optimizer -- someone haggling over saving pennies when there are pounds to be saved, and struggling to maintain their "optimized" (pay careful attention to how he used quotes here) software.

I find a lot of people often only quote a small part of Knuth's paper. It's worth realizing his paper was arguing to use goto in order to speed up critical execution paths in software.

A fuller quote:

[...] this is a noticeable saving in the overall running speed, if, say, the average value of n is about 20, and if the search routine is performed about a million or so times in the program. Such loop optimizations [using gotos] are not difficult to learn and, as I have said, they are appropriate in just a small part of a program, yet they often yield substantial savings. [...]

The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a oneshot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies.

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forgot about small efficiencies, say 97% of the time; premature optimization is the root of all evil.

It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I've become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.

After a programmer knows which parts of his routines are really important, a transformation like doubling up loops will be worthwhile. Note that this transformation introduces go to statements -- and so do several other loop optimizations.

So this is coming from someone who was actually deeply concerned with performance at the micro-level, and at the time (optimizers have gotten far better now), was utilizing goto for speed.

At the heart of this Knuth's establishment of the "premature optimizer" is:

  1. Optimizing based on hunches/superstitions/human intuitions with no past experience or measurements (optimizing blindly without actually knowing what you are doing).
  2. Optimizing in a way that saves pennies over pounds (ineffective optimizations).
  3. Seeking some absolute ultimate peak of efficiency for everything.
  4. Seeking efficiency in non-critical paths.
  5. Trying to optimize when you can barely maintain/debug your code.

None of this has to do with the timing of your optimizations, but experience and understanding -- from understanding critical paths to understanding what actually delivers performance.

Things like test-driven development and a predominant focus on interface design wasn't covered in Knuth's paper. These are more modern concepts and ideas. He was focused on implementation mostly.

Nevertheless, it's a good update to Knuth's advice -- to seek to establish correctness first through testing, and interface designs which leave you room to optimize without breaking everything.

If we try to apply a modern interpretation of Knuth, I would add "ship" in there. Even if you're optimizing the true critical paths of your software with measured gains, the fastest software in the world is worthless if it never ships. Keeping that in mind should help you make smarter compromises.

I'm leaning towards just going to the database multiple times, which I think is the right move here. It's more important that I finish the project and I feel like I'm getting hung up because of optimizations like this. My question is: is this the right strategy to be using when avoiding premature optimization?

It's going to be kind of up to you to develop the best judgement, taking into consideration some of these points above, as you most intimately understand your own requirements.

A crucial factor I would suggest is that if this is a performance-critical path dealing with a heavy load, to design your public interfaces in a way that leaves plenty of room to optimize.

For example, don't design a particle system with client dependencies to a Particle interface. That leaves no room to optimize, when you only have the encapsulated state and the implementation of a single particle to work with. In that case, you might have to make cascading changes to your codebase in order to optimize. A race car cannot utilize its speed if the road is only 10 meters long. Instead design towards the ParticleSystem interface which aggregates a million particles, e.g., with higher-level operations that deal with particles in bulk when possible. That leaves you plenty of room to optimize without breaking your designs if you find you need to optimize.

The perfectionist side of me wants to make everything optimal and perfect the first time through, but I'm finding this is complicating the design quite a bit.

Now this part does sound a bit premature. Generally your first pass should be towards simplicity. Simplicity often goes hand-to-hand with reasonably fast, faster than you might think even if you are doing some redundant work.

Anyway, I hope these points help to at least add some more things to consideration.