145
votes

I know there are similar questions here but they are either telling me to switch back to regular RDBMS systems if I need transactions or use atomic operations or two-phase commit. The second solution seems the best choice. The third I don't wish to follow because it seems that many things could go wrong and I can't test it in every aspect. I'm having a hard time refactoring my project to perform atomic operations. I don't know whether this comes from my limited viewpoint (I have only worked with SQL databases so far), or whether it actually can't be done.

We would like to pilot test MongoDB at our company. We have chosen a relatively simple project - an SMS gateway. It allows our software to send SMS messages to the cellular network and the gateway does the dirty work: actually communicating with the providers via different communication protocols. The gateway also manages the billing of the messages. Every customer who applies for the service has to buy some credits. The system automatically decreases the user's balance when a message is sent and denies the access if the balance is insufficient. Also because we are customers of third party SMS providers, we may also have our own balances with them. We have to keep track of those as well.

I started thinking about how I can store the required data with MongoDB if I cut down some complexity (external billing, queued SMS sending). Coming from the SQL world, I would create a separate table for users, another one for SMS messages, and one for storing the transactions regarding the users' balance. Let's say I create separate collections for all of those in MongoDB.

Imagine an SMS sending task with the following steps in this simplified system:

  1. check if the user has sufficient balance; deny access if there's not enough credit

  2. send and store the message in the SMS collection with the details and cost (in the live system the message would have a status attribute and a task would pick up it for delivery and set the price of the SMS according to its current state)

  3. decrease the users's balance by the cost of the sent message

  4. log the transaction in the transaction collection

Now what's the problem with that? MongoDB can do atomic updates only on one document. In the previous flow it could happen that some kind of error creeps in and the message gets stored in the database but the user's balance is not updated and/or the transaction is not logged.

I came up with two ideas:

  • Create a single collection for the users, and store the balance as a field, user related transactions and messages as sub documents in the user's document. Because we can update documents atomically, this actually solves the transaction problem. Disadvantages: if the user sends many SMS messages, the size of the document could become large and the 4MB document limit could be reached. Maybe I can create history documents in such scenarios, but I don't think this would be a good idea. Also I don't know how fast the system would be if I push more and more data to the same big document.

  • Create one collection for users, and one for transactions. There can be two kinds of transactions: credit purchase with positive balance change and messages sent with negative balance change. Transaction may have a subdocument; for example in messages sent the details of the SMS can be embedded in the transaction. Disadvantages: I don't store the current user balance so I have to calculate it every time a user tries to send a message to tell if the message could go through or not. I'm afraid this calculation can became slow as the number of stored transactions grows.

I'm a little bit confused about which method to pick. Are there other solutions? I couldn't find any best practices online about how to work around these kinds of problems. I guess many programmers who are trying to become familiar with the NoSQL world are facing similar problems in the beginning.

10
Forgive me if i am wrong but looks as though this project is going to use a NoSQL data store regardless of whether it will benefit from it or not. NoSQL's are not an alternative to SQL as a "fashion" choice but for when the technology of relational RDBMS's does not fit the problem space & a non-relational datastore does. A lot of your question has "If it was SQL then ..." & that rings warning bells to me. All the NoSQL's have come from a need to solve a problem that SQL couldn't and then they have been somewhat generalised to make easier to use & then of course the bandwagon starts to rolling.PurplePilot
I'm aware that this project is not exactly the best for trying out NoSQL. However i'm affraid if we start to use it with other projects (let's say a library collection management software because we are into collection management) and suddenly some kind of request comes in which needs transactions (and it's actually there, imagine that a books is transferred from one collection to another) we need to know how can we overcome the problem. Maybe it's just me who is narrow minded and thinks there's always a need for transactions. But it could be there's a way to overcome these somehow.NagyI
I agree with PurplePilot, you should choose a technology that fits a solution, not try to graft a solution that isn't appropriate on to a problem. Modeling data for the graph databases is a completely different paradigm than RDBMS design and you have to forget everything you know and relearn the new way of thinking.user177800
I do understand i should use the appropriate tool for the task. However for me - when i read answers like this - it seems that NoSQL is not good for anything where data is critical. It's good for Facebook or Twitter where if some comments gets lost the world goes on, but anything above of that is out of business. If that's true i don't get it why other's care about building eg. a webstore with MongoDB: kylebanker.com/blog/2010/04/30/mongodb-and-ecommerce It even mentions that most transactions can be overcome with atomic operations. What i'm searching for is the how.NagyI
You say "it seems that NoSQL is not good for anything where data is critical" is not true where it is not good (maybe) is transactional ACID type transactional processing. Also NoSQL's are designed for distributed data stores which SQL type stores can be very difficult to achieve when you get into the master slave replication scenarios. NoSQL have strategies for eventual consistency and ensuring only the latest data set is used but not ACID.PurplePilot

10 Answers

26
votes

As of 4.0, MongoDB will have multi-document ACID transactions. The plan is to enable those in replica set deployments first, followed by the sharded clusters. Transactions in MongoDB will feel just like transactions developers are familiar with from relational databases - they'll be multi-statement, with similar semantics and syntax (like start_transaction and commit_transaction). Importantly, the changes to MongoDB that enable transactions do not impact performance for workloads that do not require them.

For more details see here.

Having distributed transactions, doesn't mean that you should model your data like in tabular relational databases. Embrace the power of the document model and follow the good and recommended practices of data modeling.

85
votes

Living Without Transactions

Transactions support ACID properties but although there are no transactions in MongoDB, we do have atomic operations. Well, atomic operations means that when you work on a single document that that work will be completed before anyone else sees the document. They'll see all the changes we made or none of them. And using atomic operations, you can often accomplish the same thing we would have accomplished using transactions in a relational database. And the reason is that, in a relational database, we need to make changes across multiple tables. Usually tables that need to be joined and so we want to do that all at once. And to do it, since there are multiple tables, we'll have to begin a transaction and do all those updates and then end the transaction. But with MongoDB, we're going to embed the data, since we're going to pre-join it in documents and they're these rich documents that have hierarchy. We can often accomplish the same thing. For instance, in the blog example, if we wanted to make sure that we updated a blog post atomically, we can do that because we can update the entire blog post at once. Where as if it were a bunch of relational tables, we'd probably have to open a transaction so that we can update the post collection and comments collection.

So what are our approaches that we can take in MongoDB to overcome a lack of transactions?

  • restructure - restructure the code, so that we're working within a single document and taking advantage of the atomic operations that we offer within that document. And if we do that, then usually we're all set.
  • implement in software - we can implement locking in software, by creating a critical section. We can build a test, test and set using find and modify. We can build semaphores, if needed. And in a way, that is the way the larger world works anyway. If we think about it, if one bank need to transfer money to another bank, they're not living in the same relational system. And they each have their own relational databases often. And they've to be able to coordinate that operation even though we cannot begin transaction and end transaction across those database systems, only within one system within one bank. So there's certainly ways in software to get around the problem.
  • tolerate - the final approach, which often works in modern web apps and other applications that take in a tremendous amount of data is to just tolerate a bit of inconsistency. An example would, if we're talking about a friend feed in Facebook, it doesn't matter if everybody sees your wall update simultaneously. If okey, if one person's a few beats behind for a few seconds and they catch up. It often isn't critical in a lot of system designs that everything be kept perfectly consistent and that everyone have a perfectly consistent and the same view of the database. So we could simply tolerate a little bit of inconsistency that's somewhat temporary.

Update, findAndModify, $addToSet (within an update) & $push (within an update) operations operate atomically within a single document.

24
votes

Check this out, by Tokutek. They develop a plugin for Mongo that promises not only transactions but also a boosting in performance.

11
votes

Bring it to the point: if transactional integrity is a must then don't use MongoDB but use only components in the system supporting transactions. It is extremely hard to build something on top of component in order to provide ACID-similar functionality for non-ACID compliant components. Depending on the individual usecases it may make sense to separate actions into transactional and non-transactional actions in some way...

7
votes

Now what's the problem with that? MongoDB can do atomic updates only on one document. In the previous flow it could happen that some kind of error creeps in and the message gets stored in the database but the user's balance is not gets reduced and/or the transaction is not gets logged.

This is not really a problem. The error you mentioned is either a logical (bug) or IO error (network, disk failure). Such kind of error can leave both transactionless and transactional stores in non-consistent state. For example, if it has already sent SMS but while storing message error occurred - it can't rollback SMS sending, which means it won't be logged, user balance won't be reduced etc.

The real problem here is the user can take advantage of race condition and send more messages than his balance allows. This also applies to RDBMS, unless you do SMS sending inside transaction with balance field locking (which would be a great bottleneck). As a possible solution for MongoDB would be using findAndModify first to reduce the balance and check it, if it's negative disallow sending and refund the amount (atomic increment). If positive, continue sending and in case it fails refund the amount. The balance history collection can be also maintained to help fix/verify balance field.

6
votes

The project is simple, but you have to support transactions for payment, which makes the whole thing difficult. So, for example, a complex portal system with hundreds of collections (forum, chat, ads, etc...) is in some respect simpler, because if you lose a forum or chat entry, nobody really cares. If you, on the otherhand, lose a payment transaction that's a serious issue.

So, if you really want a pilot project for MongoDB, choose one which is simple in that respect.

6
votes

Transactions are absent in MongoDB for valid reasons. This is one of those things that make MongoDB faster.

In your case, if transaction is a must, mongo seems not a good fit.

May be RDMBS + MongoDB, but that will add complexities and will make it harder to manage and support application.

6
votes

This is probably the best blog I found regarding implementing transaction like feature for mongodb .!

Syncing Flag: best for just copying data over from a master document

Job Queue: very general purpose, solves 95% of cases. Most systems need to have at least one job queue around anyway!

Two Phase Commit: this technique ensure that each entity always has all information needed to get to a consistent state

Log Reconciliation: the most robust technique, ideal for financial systems

Versioning: provides isolation and supports complex structures

Read this for more info: https://dzone.com/articles/how-implement-robust-and

4
votes

This is late but think this will help in future. I use Redis for make a queue to solve this problem.

  • Requirement:
    Image below show 2 actions need execute concurrently but phase 2 and phase 3 of action 1 need finish before start phase 2 of action 2 or opposite (A phase can be a request REST api, a database request or execute javascript code...). enter image description here

  • How a queue help you
    Queue make sure that every block code between lock() and release() in many function will not run as the same time, make them isolate.

    function action1() {
      phase1();
      queue.lock("action_domain");
      phase2();
      phase3();
      queue.release("action_domain");
    }
    
    function action2() {
      phase1();
      queue.lock("action_domain");
      phase2();
      queue.release("action_domain");
    }
    
  • How to build a queue
    I will only focus on how avoid race conditon part when building a queue on backend site. If you don't know the basic idea of queue, come here.
    The code below only show the concept, you need implement in correct way.

    function lock() {
      if(isRunning()) {
        addIsolateCodeToQueue(); //use callback, delegate, function pointer... depend on your language
      } else {
        setStateToRunning();
        pickOneAndExecute();
      }
    }
    
    function release() {
      setStateToRelease();
      pickOneAndExecute();
    }
    

But you need isRunning() setStateToRelease() setStateToRunning() isolate it's self or else you face race condition again. To do this I choose Redis for ACID purpose and scalable.
Redis document talk about it's transaction:

All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.

P/s:
I use Redis because my service already use it, you can use any other way support isolation to do that.
The action_domain in my code is above for when you need only action 1 call by user A block action 2 of user A, don't block other user. The idea is put a unique key for lock of each user.

3
votes

Transactions are available now in MongoDB 4.0. Sample here

// Runs the txnFunc and retries if TransientTransactionError encountered

function runTransactionWithRetry(txnFunc, session) {
    while (true) {
        try {
            txnFunc(session);  // performs transaction
            break;
        } catch (error) {
            // If transient error, retry the whole transaction
            if ( error.hasOwnProperty("errorLabels") && error.errorLabels.includes("TransientTransactionError")  ) {
                print("TransientTransactionError, retrying transaction ...");
                continue;
            } else {
                throw error;
            }
        }
    }
}

// Retries commit if UnknownTransactionCommitResult encountered

function commitWithRetry(session) {
    while (true) {
        try {
            session.commitTransaction(); // Uses write concern set at transaction start.
            print("Transaction committed.");
            break;
        } catch (error) {
            // Can retry commit
            if (error.hasOwnProperty("errorLabels") && error.errorLabels.includes("UnknownTransactionCommitResult") ) {
                print("UnknownTransactionCommitResult, retrying commit operation ...");
                continue;
            } else {
                print("Error during commit ...");
                throw error;
            }
       }
    }
}

// Updates two collections in a transactions

function updateEmployeeInfo(session) {
    employeesCollection = session.getDatabase("hr").employees;
    eventsCollection = session.getDatabase("reporting").events;

    session.startTransaction( { readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } } );

    try{
        employeesCollection.updateOne( { employee: 3 }, { $set: { status: "Inactive" } } );
        eventsCollection.insertOne( { employee: 3, status: { new: "Inactive", old: "Active" } } );
    } catch (error) {
        print("Caught exception during transaction, aborting.");
        session.abortTransaction();
        throw error;
    }

    commitWithRetry(session);
}

// Start a session.
session = db.getMongo().startSession( { mode: "primary" } );

try{
   runTransactionWithRetry(updateEmployeeInfo, session);
} catch (error) {
   // Do something with error
} finally {
   session.endSession();
}