0
votes

I'm using Mongo, Elastic Search and this river plugin: https://github.com/richardwilly98/elasticsearch-river-mongodb

I have successfully set everything up in that the river keeps the ES data updated when Mongo is updated, but the river is straight up copying all the properties from the Mongo documents into ES, but I only want a small sub-set of those records. E.g. if a Mongo doc has 30 properties all of them are getting put into ES instead of only the 5 that I want. I assume the issue is with the mappings, and I've followed several docs and another Stack Overflow thread (curl -X POST -d @mapping.json + mapping not created) but it still is not working for me. Here is what I'm doing:

I'm creating my index with:

curl -XPOST "http://localhost:9200/mongoindex" -d @index.json

index.json:

{
  "settings" : {
      "number_of_shards" : 1
  },
  "analysis" : {
    "analyzer" : {
      "str_search_analyzer" : {
        "tokenizer" : "keyword",
        "filter" : ["lowercase"]
      },
      "str_index_analyzer" : {
         "tokenizer" : "keyword",
         "filter" : ["lowercase", "ngram"]
      }
    },
    "filter" : {
      "ngram" : {
        "type" : "ngram",
        "min_gram" : 2,
        "max_gram" : 20
      }
    }
  }
}

Then running:

curl -XPOST "http://localhost:9200/mongoindex/listing/_mapping" -d @mapping.json

With this data:

{
   "listing":{
      "properties":{
        "_all": {
          "enabled": false
        },
        "title": {
          "type": "string",
          "store": false,
          "index": "not_analyzed"
        },
        "bathrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "bedrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "address": {
          "type": "nested",
          "include_in_parent": true,
          "store": true,
            "properties": {
              "counrty": {
                "type":"string"
              },
              "city": {
                "type":"string"
              },
              "stateOrProvince": {
                "type":"string"
              },
              "fullStreetAddress": {
                "type":"string"
              },
              "postalCode": {
                "type":"string"
              }
            }
        },
        "location": {
          "type": "geo_point",
          "full_name": "geometry.coordiantes",
          "store": true
        }
      }
   }
}

Then finally creating the river with:

curl -XPUT "http://localhost:9200/_river/mongoindex/_meta" -d @river.json

river.json:

{
  "type": "mongodb",
  "mongodb": {
    "db": "blueprint",
    "collection": "Listing",
    "options": {
      "secondary_read_preference": true,
      "drop_collection": true
    }
  },
  "index": {
    "name": "mongoindex",
    "type": "listing"
  }
}

After all that the river works in that ES is populated, but its a verbatim copy of Mongo right now, and I need to modify the mappings, but it just is not taking effect. What am I missing?

This is what my mapping looks like after the river runs.... nothing like what I want it to look like.

ES mapping

enter image description here

2
Can you try this again, but before creating the river get the mappings with this command and add to your question? curl -XGET 'localhost:9200/_mapping?pretty'John Petrone
I get the expected result at that point, the mapping is correct after the 2nd command. Its when the river is created that it resets it. See the 2nd screenshot.Brian Litzinger

2 Answers

0
votes

I would set dynamic mapping to false:

The dynamic creation of mappings for unmapped types can be completely disabled by setting index.mapper.dynamic to false.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html

Others have had similar issues to yours and it looks like the best solution so far has been to prevent the MongoDB River from dynamically mapping at all:

https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/75

0
votes

Turns out the issue was that the dynamic property was left out of the mappings config. It should be in 2 places, on the index.json as shown above, and in the mappings.json:

{
   "listing":{
      "_source": {
        "enabled": false
      },
      "dynamic": false,      // <--- Need to add this
      "properties":{
        "_all": {
          "enabled": false
        },
        "title": {
          "type": "string",
          "store": false,
          "index": "str_index_analyzer"
        },
        "bathrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "bedrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "address": {
          "type": "nested",
          "include_in_parent": true,
          "store": true,
            "properties": {
              "counrty": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "city": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "stateOrProvince": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "fullStreetAddress": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "postalCode": {
                "type":"string"
              }
            }
        },
        "location": {
          "type": "geo_point",
          "full_name": "geometry.coordiantes",
          "store": true
        }
      }
   }
}

The 902 docs vs 451, I think that is an bug in the ElasticSearch Head plugin I'm using to browse documents. It doesn't have duplicates, but a couple of spots show 902 docs as a summary of sorts.