3
votes

I am choosing a platform for a web application.

I understand how cloud computing can scale front end servers, but what do they do with the database servers?

Is there something that the developer has to do to allow for this?

7
Heroku presents their database scaling options in a very clear way, check them out at heroku.com/pricing#blossom-1Jonas Elfström

7 Answers

2
votes

In general, yes. The most common way to scale a DB across multiple machines is to use a column store. That way each column in a table can be stored on a separate machine, dramatically increasing the amount of cpu power and bandwidth available to search. Searches can be done in parallel also, a search on the company column would only hit one server, so a search on the year column would not be any slower.

From what I've read, this is how Google's MapReduce works.

The benefits section of wikipedia's column store page is particularly informative.

Along similar lines, OLAP is interesting. OLAP changes the read/write tradeoff completely. Querying and reading is fast for large and complicated queries, but writing new data requires a time consuming rebuild process.

1
votes

Short Answer: Yes.

Long Answer: It depends. What kind of processing needs to be done? Can it be map reduced? There's many solutions that exist for this sort of thing. Distributed caching a la memcache can also help scale many services in the backend.

1
votes

It very much depends on the solution that you select for your backend. Some applications use a mix to handle different types of data.

A database such as MySQL or PostgreSQL are difficult to work with when scaling is necessary. For our project we decided to use Cassandra (which at the time you asked probably did not yet exist!) That allows you to store data on any number of backend computers. In doing so you also allow backend processes to run on completely separate computers so you can do all sorts of computations without the need to slow down the database or the front end (i.e. Apache).

I talk about this in our project on this page:

http://snapwebsites.org/implementation/snap-websites-processes

Search for the word "Process". There is also an image that represents the different processes, and each one can run on a separate computer (if you have such a large load that you need more horse power):

enter image description here

And actually some of the Snap! backends shown in that example can run on multiple computers, while one instance handles this website, the other instance will handle that other website. Quite powerful.

0
votes

This depends on the database

Slicehost use MySQL Cluster, Google uses that map-reduce hype and others. Depends on the cloud provider and the database they use

Others just provide a VM and you setup your own database on virtual machines that have private IPs

0
votes

If you're using a cloud provider that simply gives you ssh access to a virtual box, you'll need to implement your own database scaling. If you run on Google AppEngine, the Intuit Partner Platform or something similar, the scalability is built into the datastore provided to you.

Basically, theres nothing magical about cloud computing. In order to gain this built in scalability, you give up some freedom. Google's datastore doesn't provide all the aspects of a full relational database, but you can scale to ridiculous amounts of traffic.

0
votes

Amazon and Google uses data stores, it's different from a traditional RDBMS.

You can find some more background information by following thislink

And you can find a short list of datastores here

0
votes

As far as the how, I recently came upon a paper dedicated to exactly this. It was discussed in a lecture, so although I'm familiar with the paper's contents, I haven't read it myself. Still, the lecture had very interesting ideas: http://reports-archive.adm.cs.cmu.edu/anon/2008/CMU-CS-08-150.pdf