3
votes

I work with Couchbase DB and I want to index part of its data on Elastic Search (ES). The data from Couchbase should be synced, i.e. if the document on CB changes, it should change the document on ES. I have several questions about what is the best way to do it:

  • What is the best way to sync the data ? I saw that there is a CB plugin for ES (http://www.couchbase.com/couchbase-server/connectors/elasticsearch), but it that the recommended way ?
  • I don't want to store all the CB document on ES, but only part of it, e.g. some of the fields I want to store and some not - how can I do it ?
  • My documents may have different attributes and the difference may be big (e.g. 50 different attributes/fields). Assuming I want to index all these attributes to ES, will it effect the performance because I have a lot of fields indexed ?

10x,

3

3 Answers

4
votes

Given the doc link, I am assuming you are using Couchbase and not CouchDB.

  1. You are following the correct link for use of Elastic Search with Couchbase. Per the documentation, configure the Cross Data Center Replication (XDCR) capabilities of Couchbase to push data to ES automatically as mutations occur.

  2. Without a defined mapping file, ES will create a default mapping. You can provide your own mapping file (or alter the one it generates) to control which fields get indexed. Refer to the enabled property in the ES documentation at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-object-type.html.

  3. Yes, indexing all fields will affect performance. You can find some performance management tips for the Couchbase integration at http://docs.couchbase.com/couchbase-elastic-search/#managing-performance. The preferred approach to the integration is perform the search in ES and only get keys back for the matched documents. You then make a multiget call against the Couchbase cluster to retrieve the document details themselves. So while ES will index many fields, you do not store all fields there nor do you retrieve their values from ES. The in-memory multiget against Couchbase is the fastest way to retrieve the matching documents, using the IDs from ES.

1
votes

Lot of questions..!

Let me answer one by one:

1)The best way and already available solution to use river plugin to dynamically sync the data.And also it ll index the changed document alone..It ll help a lot in performance.

2)yes you can restrict the field to be indexed in river plugin. refer

The documents of plugin is available in couchbase website itself. Refer: http://docs.couchbase.com/couchbase-elastic-search/

Github river is still in development.,but you can use the code and modify as your need.

https://github.com/mschoch/elasticsearch-river-couchbase

3)If you index all the fields, yes there will be some lag in performance.So better to index the needed fields alone. if you need to store some field just to store, then mention in mapping as not analyzed to specific.It will decrease indexing time and also searching time.

HOpe it helps..!

1
votes

You might find this additional explanation regarding Don Stacy's answer to question 2 useful:

When replicating from Couchbase, there are 3 ways in which you can interfere with Elasticsearch's default mapping (before you start XDCR) and thus, as desired, not store certain fields by setting "store" = false:

  1. Create manual mappings on your index
  2. Create a dynamic template
  3. Edit couchbase_template.json

Hints:

  1. Note that when we do XDCR from Couchbase to Elasticsearch, Couchbase wraps the original document in a "doc" field. This means that you have to take this modified structure into account when you create your mapping. It would look something like this:

    curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
    {
      "couchbaseDocument": {
        "_source": {
          "enabled": false
        },
        "properties": {
          "doc": {
            "properties": {
              "your_field_name": {
                "store": true,
                ...
              },
              ...
            }
          }
        }
      }
    }'
    

    Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

    Including/Excluding fields from _source: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

  2. Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/dynamic-templates.html

  3. https://forums.couchbase.com/t/about-elasticsearch-plugin/2433 https://forums.couchbase.com/t/custom-maps-for-jsontypes-with-elasticsearch-plugin/395