4
votes

I'm learning SQL at the moment and I've read that joins and subqueries can potentially be performance destroyers. I (somewhat) know the theory about algorithmic complexity in procedural programming languages and try to be mindful of that when programming, but I don't know how expensive different SQL queries can be. I'm deciding whether I should invest time in learning about SQL performance or just notice it when my queries run slow. The base question for me then is: is premature optimization for SQL as evil as it is for procedural languages?

As added information, I work in an environment where, most of the time, high performance is not an issue and the biggest tables I have to work with have some 150k rows.

Here's the Donald Knuth quote I refer to when saying "evil":

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

6
Yes. Worry about the problems you have, not about ones you might theoretically encounter someday (especially if performance is not an issue).Piskvor left the building

6 Answers

3
votes

I would say that some general notions about performance are a must-have : it'll prevent you from writing really bad queries that can hurt your application (Even if you don't have millions of rows in your tables).

It'll also help you design your database so it's more officient-oriented : you'll have some ideas about where to put indexes, for instance.

But you shouldn't have performance as a first goal : the first thing is to have an application that works ; and, then, if needed, you'll optimize it (having some performance notions while developping will help you have an application that's easier to optimize, though).

Note I would not say that "having notions about performances" is "premature optimization", as long as you don't just "optimize", but just "write correctly" ; I would rather call it good practice that'll help to write better quality code ;-)

2
votes

What Knuth means is: it's really, really important to know about SQL optimisation but only when you need to. As you say, "most of the time ... high performance is not an issue."

It's that 3% of times when you do need high performance that it's important to know what rules to break and why.

However, unlike procedural languages, even for 150k rows it can be important to know a little about how your query is processed. For instance free text searching will be very slow compared with searching through exact matches on indexed columns. It's going the final steps into e.g. sharding or full denormalisation where most DBAs and developers draw the line.

1
votes

I wouldn't say that SQL optimization has as many pitfalls as premature programming optimization. Designing your schema and queries ahead of time with performance in mind can help you avoid some really nasty redesigns later on. That being said, spending a day getting rid of a table scan can be utterly worthless to you in the long run if that query isn't a slow query, can be cached, or is rarely called in a manner that would impact your application.

I personally profile my queries and focus on the worst, and most used queries. Careful design ahead of time cuts out most of the worst.

1
votes

I would say that you should make the SQL as easily readeble as possible, and only worry about the performance once it hits you.

That said.

Be mindfull of standard things as you develop, such as indexes, sub selects, use of cursors where a standard query would do the job, etc.

It will not hurt to develop the original correctly, and you can optimize the problems later when it is needed.

EDIT

Also remeber that maintainability of your SQL code is very important, and that debugging SQL is slightly more difficult than normal coding.

1
votes

Knuth says "forget about 97%" but for a typical web app it's in the database IO where 97% of the request time is spent. This is where a little optimization effort can yield greatest results.

If this is the kind of apps you're writing I strongly suggest learning as much of how RDBMSes work as you can afford. Other people give you excellent suggestions, and I'd add that I usually follow this list top-down when deciding how to spent my "optimization budget":

  1. Schema design. Think twelve times about normalizaton and access strategies. This will save you many painful hours later.

  2. Query readability. Related to #1, sometimes trying to reogranize your queries gives a better understanding of how schema should look. Also it'll help later when you ask for help.

  3. Avoid subqueries in SELECT list, use JOINs.

  4. If there are slow queries reach for Profiler. Check for missing indexes first And finally, if there are still slow queries, try to rewrite it.

Keep in mind also, that database performance very much depends on data distribution and number of simultaneous requests (because of locking). Even though a query completes in 1 sec. on your underpowered netbook it could take 15 seconds on the 8-core server. If possible, check your queries on actual data. If you know that concurrency is going to be high it's (paradoxically) better to use many small queries than one big one.

0
votes

I agree with everything that's said here, and I'd like to add: make sure that your SQL is well-encapsulated so that, when you discover what needs to be optimized, there's only one place you need to change it, and the change will be transparent to whatever code calls it.

Personally, I like to encapsulate all of my SQL in PL/SQL procedures, but there are some who disagree with that. Whatever you do, I recommend trying to avoid putting your SQL "inline" with other sourcecode. That seems to always lead to cut-and-pasting and quickly becomes hard to maintain. Put your SQL elsewhere, and try to re-use it as much as possible.

Also, read up on indexes, how they really work, and when you should and shouldn't use them. A lot of people's first instinct, when they get a slow query, is to index the table to death. That might solve the problem in the short term, but long-term an over-index table will be slow to insert and update into. A few well-chosen indexes are much better than indexing every field. Try reading "Refactoring SQL Applications" by Stephane Faroult.

Finally, as said above, a properly normalized database design will help avoid 99% of your slow queries. Denormalization is neccesary sometimes, but it's important that you know the rules, before you break them.

Good luck!