I am choosing a platform for a web application.
I understand how cloud computing can scale front end servers, but what do they do with the database servers?
Is there something that the developer has to do to allow for this?
I am choosing a platform for a web application.
I understand how cloud computing can scale front end servers, but what do they do with the database servers?
Is there something that the developer has to do to allow for this?
In general, yes. The most common way to scale a DB across multiple machines is to use a column store. That way each column in a table can be stored on a separate machine, dramatically increasing the amount of cpu power and bandwidth available to search. Searches can be done in parallel also, a search on the company column would only hit one server, so a search on the year column would not be any slower.
From what I've read, this is how Google's MapReduce works.
The benefits section of wikipedia's column store page is particularly informative.
Along similar lines, OLAP is interesting. OLAP changes the read/write tradeoff completely. Querying and reading is fast for large and complicated queries, but writing new data requires a time consuming rebuild process.
It very much depends on the solution that you select for your backend. Some applications use a mix to handle different types of data.
A database such as MySQL or PostgreSQL are difficult to work with when scaling is necessary. For our project we decided to use Cassandra (which at the time you asked probably did not yet exist!) That allows you to store data on any number of backend computers. In doing so you also allow backend processes to run on completely separate computers so you can do all sorts of computations without the need to slow down the database or the front end (i.e. Apache).
I talk about this in our project on this page:
http://snapwebsites.org/implementation/snap-websites-processes
Search for the word "Process". There is also an image that represents the different processes, and each one can run on a separate computer (if you have such a large load that you need more horse power):
And actually some of the Snap! backends shown in that example can run on multiple computers, while one instance handles this website, the other instance will handle that other website. Quite powerful.
If you're using a cloud provider that simply gives you ssh access to a virtual box, you'll need to implement your own database scaling. If you run on Google AppEngine, the Intuit Partner Platform or something similar, the scalability is built into the datastore provided to you.
Basically, theres nothing magical about cloud computing. In order to gain this built in scalability, you give up some freedom. Google's datastore doesn't provide all the aspects of a full relational database, but you can scale to ridiculous amounts of traffic.
As far as the how, I recently came upon a paper dedicated to exactly this. It was discussed in a lecture, so although I'm familiar with the paper's contents, I haven't read it myself. Still, the lecture had very interesting ideas: http://reports-archive.adm.cs.cmu.edu/anon/2008/CMU-CS-08-150.pdf