Difference between Partial Replication and Sharding?

Question

I was wondering if sharding is an alternate name for partial replication or not. What I have figured out that --

Partial Repl. – each data item has only copies at some but not all of the nodes (‘Sharding’?)

Pure Partial Repl. – has only copies of a subset of the data item but no node contains a full copy of the database

Hybrid Partial Repl. – a set of nodes are full replicas and another set of nodes are partial replicas

Doron Levari Doron Levari · Accepted Answer · 2013-01-06T01:57:15

Partial replication is an interesting way, in which you distribute the data with replication from a master to slaves, each contains a portion of the data. Eventually you get an array of smaller DBs, read only, each contains a portion of the data. Reads can very well be distributed and parallelized.

But what about the writes?

Those are still clogged, in 1 big fat lazy master database, tasks as buffer management, locking, thread locks/semaphores, and recovery tasks - are the real bottleneck of the OLTP, they make writes impossible to scale... See more in my blog post here: http://database-scalability.blogspot.com/2012/08/scale-up-partitioning-scale-out.html. BTW - your topic right here just gave me a great idea for another post. I'll link to this question and give you the credit! :)

Sharding is where data appears only once, within an array of DBs. Each database is the complete owner of the data, data is read from there, data is written to there. This way, reads and writes are distributed and parallelized. Real scale-out can be acheived.

Sharding is a mess to handle, to maintain, it's hard as hell. ScaleBase (I work there), enable automatic transparent scale-out, just throw it in the middle and you'll have 10 DBs at the back, and it'll look like 1 to your app. Automatic, transparent super-sharding - in a box.

Difference between Partial Replication and Sharding?

2 Answers