3
votes

If you have scaled SQL server with one DB for writes and multiple DBs for reads. Wouldn't there be a delay for data to be replicated from the write DB to the to other read databases? In which case isn't the data inconsistent?

So where would a scaled relational DB fall in the CAP theorem?

Update:

In relational DBs consistency means there wont be partial updates. For example if someone transfers money from one account to another and the whole thing is a part of one transaction, it wont happen that you take money out of one account but doesn't show up in another account.

In CAP theorem consistence means all the components see the same data. That consistency is different from consistency in ACID.

From what I know, relational DBs like SQL server are supposed to be CA (consistent and available). This would make sense if there is just one database. Because everyone would see the same data. But what if the SQL server is scaled with multiple databases? In that case would all databases still see the same data? If not, would it be consistent (in CAP theorem)?

My feeling is a scaled relational DB is AP (Available and partition tolerant) and not CA (Consistent and available).

2
"In which case isn't the data inconsistent?" And what is the CAP definition of "inconsistent", and why would such a delay imply such inconsistency?philipxy
@philipxy So there are three main forms of consistency. Strict, casual and eventual. All these consistencies talk about the same version of data being replicated across all nodes. But the consistency in ACID is more related to atomicity I think. Where you cannot have partial data committed. My question is along the lines of distributed systems consistency and not ACID consistency. I believe they are different definitions of consistency. Please correct me if I am wrong.Foo
Please clarify via edits, not comments. PS You have not addressed my last comment: given a definition of CAP consistency or for that matter CAP & given your reasoning. Including: What is "scaled" & what does it have to do with CAP? What is "delay" & what does it have to do with CAP? (The only "delay" CAP addresses is CAP availability.) What exactly do you mean by falling somewhere in CAP? You just use a lot of terms that you don't clarify (let alone use the CAP definitions of) & you don't give your reasoning.philipxy
philip Why so much animosity dude? I am just trying to learn. There is no need for this belittling tone.Foo

2 Answers

3
votes

I've read different definitions of consistency in regards to the CAP theorem.

  1. Some definitions of consistency say that once some data is persisted in a system, all reads will read the most recently written data. In this definition, a replicated database (you call this "scaled" but I wouldn't use that term) has a risk of returning inconsistent data, if the replication is asynchronous.

    To mitigate this risk, some systems make sure replication is synchronous, or as close to synchronous as they can implement. Galera, for example, sends transaction write sets to its replicas synchronously. If you try to read from the replica, and it detects that there are write sets pending but not yet applied, it can block your read until it has caught up with the pending write sets (this behavior is configurable). So you'll never read data that is out of date.

    The cost of maintaining perfectly consistent reads over distributed systems in this manner is usually more expensive than users want. It will become a performance bottleneck in a system that has a high rate of updates. So for practical reasons, most projects accept that "replication lag" is a necessary compromise.

  2. Other definitions of consistency are closer to atomicity, i.e. transactions will not be persisted in a partially-complete state. So all constraints will be satisfied when you read the data, whether you read the data before or after the transaction is applied. In this definition, it's quite easy to imagine the replica database instance remaining consistent, if it applies updates using the same transaction semantics used on the master. If you read data from the replica, you might read data that hasn't yet had the latest updates applied, but it will never be in an inconsistent state with respect to constraints.

0
votes

There is nothing called a scaled RDBMS. We do have "RDBMS Clusters with shared storage": here can keep on adding nodes to achieve high availability of RDBMS.

In other words: If you meant a "Distributed RDBMS" by mentioning "Scaled RDBMS" - it doesn't exist. You can have RDBMS on only one node. If you add another node, then that will be "another" RDBMS and it would NOT coalesce with the first one giving you a single view(unlike a typical NoSQL Database). Although, you can happily keep on adding storage nodes behind the RDBMS.