1
votes

We are using a CloudantDB as a document store, containing a list of data that we want to process.

At runtime, we basically want to get one document, process it, and if processed successfully remove it from the DB.

The only mechanisms I see are either to get the entire list of documents (which might not be good for us since it is likely to be a very large list), or individual document is we have the ID (which we won't have to start). If I were dealing with a conventional SQL database, I might have a cursor which I only advance when I want to process a document.

I am familiar with views, but I am not sure that helps here either.

Am I missing some option?

2
See my answer below. If there are other constraints that prevent you from doing it this way please update your question and I will be more than happy to take a look.markwatsonatx

2 Answers

4
votes

There are a number of options for retrieving documents from Cloudant. Views are the underlying technology that allow you to query, sort, and aggregate documents. In your particular example it sounds like you just want to get the most (or least) recent document. You can do this with a view, or in Cloudant you can simply create an index.

Suppose you have a date field called create_date. In Cloudant you can create an index like so (go to Query then click edit next to "Your available indexes"):

{
  "index": {
    "fields": [
      "create_date"
    ]
  },
  "type": "json"
}

This will create a view and you will see it listed under "Design Documents". You can query that view in the dashboard as follows:

{
  "selector": {
    "create_date": {
      "$gt": 0
    }
  },
  "fields": [
    "_id",
    "_rev"
  ],
  "sort": [
    {
      "create_date": "desc"
    }
  ],
  "limit": 1
}

Note, I have limited my query to 1 document. This will return the most recent document added to Cloudant. To retrieve the earliest document added to Cloudant change the sort to "create_date": "asc".

You can run this outside of the dashboard using an HTTP POST call to /db/_find/. See this link for more information:

https://docs.cloudant.com/cloudant_query.html#finding-documents-using-an-index

UPDATE: Using text indexes and bookmarks

The above approach assumes you are going to delete each document and re-run the query every time. If you used an ascending sort you would always process the documents in order, but if you used a descending sort you could process newer documents as they are inserted.

An alternative approach would be to use bookmarks (as suggested by the OP in the comments below). To do see first create a text index in Cloudant:

{
  "index": {},
  "type": "text"
}

Run the same query as above. The results will now include a bookmarks field similar to the following:

{
  "docs":[{
    "_id":"aa279ae2835f51d8ea13ee3e6ae3a210",
    "_rev":"1-e90f3814f49b3e89158f8d2337de89cb"}
  ],
  "bookmark": "g1AAAAD4eJzLYWBgYM5gTmHQSElKzi9KdUhJMtRLytVNSczRLS5JzEtJLEox1EvOyS9NScwr0ctLLckB6mBKUgCSSfb____PAvPdHK_uzd_TwMCQKJ1Fuml5LECSYQGQAhq4H2HiAWEHoIkKaCaaE23iAYiJ9xEmHhY7AHZjFgAnFk_X"
}

In subsequent queries you can pass the bookmark to traverse the documents in order:

{
  "selector": {
    "create_date": {
      "$gt": 0
    }
  },
  "fields": [
    "_id",
    "_rev"
  ],
  "sort": [
    {
      "create_date": "desc"
    }
  ],
  "limit": 1,
  "bookmark" : "g1AAAAD4eJzLYWBgYM5gTmHQSElKzi9KdUhJMtRLytVNSczRLS5JzEtJLEox1EvOyS9NScwr0ctLLckB6mBKUgCSSfb____PAvPdHK_uzd_TwMCQKJ1Fuml5LECSYQGQAhq4H2HiAWEHoIkKaCaaE23iAYiJ9xEmHhY7AHZjFgAnFk_X"
}

More information about bookmarks can be found here:

https://docs.cloudant.com/cloudant_query.html#working-with-indexes

0
votes

Ok, here's how you can do what you want. As I understand, you probably have a view that you can fetch.

If the view doesn't have much duplicate keys, it shouldn't be a problem. If you have duplicate keys, you could add the doc.id in the keys emitted by the view.

What you have to do is as you'd do some kind of cursor... Fetching the whole list is obviously not a good idea, but fetching 2 documents shouldn't be that bad.

First, fetch the 2 first documents. The second document is required to use as our pointer for the next fetch.

Process your document and delete it from couchdb. Use the key of the second previously fetched document and fetch the next document. You can add a skip=1 to not fetch the document that you already fetched.

http://url?start_key=previous_doc&limit=1&skip=1