5
votes

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents. We define the index with the following command:

curl -XPOST 'localhost:9200/items' -d @itemsdef.json

where itemsdef.json has the following definition

{
"mappings" : {
    "item" : {
        "properties" : {
            "id" : {"type" : "long" },
            "name" : {
                "type" : "string",
                "_analyzer" : "textIndexAnalyzer"   
            },
            "location" : {"type" : "geo_point" },
        }
    }
},
"settings" : {
    "analysis" : {
        "analyzer" : {

                "activityIndexAnalyzer" : {
                    "alias" : ["activityQueryAnalyzer"],
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textIndexAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textQueryAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
                }       
        },
        "filter" : {        
                "spanish_stop" : {
                    "type" : "stop",
                    "ignore_case" : true,
                    "enable_position_increments" : true,
                    "stopwords_path" : "analysis/spanish-stopwords.txt"
                },
                "spanish_synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "analysis/spanish-synonyms.txt"
                },
                "word_delimiter_impl" : {
                    "type" : "word_delimiter",
                    "generate_word_parts" : true,
                    "generate_number_parts" : true,
                    "catenate_words" : true,
                    "catenate_numbers" : true,
                    "split_on_case_change" : false                  
                }               
        }
    }
}
}

Then we add the child document definition using the following command:

curl -XPOST 'localhost:9200/items/slot/_mapping' -d @slotsdef.json

Where slotsdef.json has the following definition:

{
"slot" : {
    "_parent" : {"type" : "item"},
    "_routing" : {
        "required" : true,
        "path" : "parent_id"
    },
    "properties": {
        "id" : { "type" : "long" },
        "parent_id" : { "type" : "long" },
        "activity" : {
            "type" : "string",
            "_analyzer" : "activityIndexAnalyzer"
        },
        "day" : { "type" : "integer" },
        "start" : { "type" : "integer" },
        "end" :  { "type" : "integer" }
    }
}   
}

Finally we perform a bulk index with the following command:

curl -XPOST 'localhost:9200/items/_bulk' --data-binary @testbulk.json

Where testbulk.json holds the following data:

{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}

We see through ES Head plugin that definitions seem to be ok. We test the analyzers to check that they have been loaded and they work. Both documents appear listed in ES Head browser view. But if we try to retrieve the child item using the API, ES responds that it does not exist:

$ curl -XGET 'localhost:9200/items/slot/126'
{"_index":"items","_type":"slot","_id":"126","exists":false}

When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response.

My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works.

Any clue on how to be able to retrieve individual child documents? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false.

1

1 Answers

10
votes

The child documents are using parent's id for routing. So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query:

curl "localhost:9200/items/slot/126?routing=35"

If parent id is not available, you will have to search for the child documents:

curl "localhost:9200/items/slot/_search?q=id:126"

or switch to an index with a single shard.