1
votes

With Elasticsearch I have created an index using a custom mapping and custom set of analszers, however I'm not able to do query search on the _all field.

I'm using these analyzers:

{
    "analysis": {
        "analyzer": {
            "case_insensitive": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": [
                    "lowercase",
                    "asciifolding"
                ],
                "char_filter": "punctuation"
            }
        },
        "char_filter": {
            "punctuation": {
                "type": "mapping",
                "mappings": [
                    ".=>\\u0020",
                    "-=>\\u0020",
                    "_=>\\u0020"
                ]
            }
        }
    }
}

and this mapping:

{
"article": {
    "_all": {
        "enabled": true,
        "store": "yes",
        "index_analyzer": "case_insensitive",
        "search_analyzer": "case_insensitive"
    },
    "properties": {
        "title": {
            "type": "string",
            "index": "analyzed"
        },
        "subtitle": {
            "type": "string",
            "analyzer": "case_insensitive"
        },
        "comment": {
            "type": "string",
            "index": "not_analyzed"
        },
        "review": {
            "type":"string",
            "index": "not_analyzed",
            "include_in_all":false
        }
    }
}

}

Then I add a document like this:

{
    "title": "This is the story of a wonderful man.", 
    "subtitle":"A man goes on vacation in the worst place possible.",
    "comment": "I like the movie very much, however I did not undertand it.",
    "review":"Very well"
}

and I expect the following 3 out of 4 fields shall be included in _all, in particular title, subtitle and comment.

The analyzer is working as following (tested using the analyzer test in elasticsearch):

"I like the movie very much, however I did not undertand it." -> "i like the movie very much, however i did not undertand it "

"This is the story of a wonderful man." -> "this is the story of a wonderful man "

I expect that at least searching on _all using the query: "This is the story of a wonderful man." I should be able to find the document.

What am I doing wrong?

How is elasticsearch populating the _all field?

If the field 'title' shall be added to the _all field, which data is used and how? is it using the output of the analyzer selected for the 'title' field as input for the analyzer of the _all or is using the raw data?

How is the flow of data in the _all field? For example

input -> analyzer -> title -> index_analyser -> _all

or

input -> analyzer -> title -> index_analyser -> _all

Thank you in advance...

1

1 Answers

0
votes

Your mapping looks ok to me. The only thing I would try is to set one of the fields explicitly to include_in_all=true and then rerun your query.

According to the docs, it may be that as you are overriding the default value of include_in_all for one of the fields, it may have changed it for all the other fields of the objects. See here _all

Relevant text from the documentation is below:

Inclusion in the _all field can be controlled on a field-by-field basis by using the include_in_all setting, which defaults to true. Setting include_in_all on an object (or on the root object) changes the default for all fields within that object.

UPDATE:

I think I know why its not working. Here is what I did. First, I removed the custom analysers from the _all_ field (so using the standard analyser). With this I was able to query and get the results as expected. Results were returned for terms that were in any of the document attributes but review. At least this confirms that the general behaviour of _all is correct. Next to test the analysers, I did a query on the subtitle field with the exact text(as it is using keyword analyser). This also worked. Then I realised that _all is an aggregated field and then analysed.

So the query should include all the text from all the fields to work. But again, how do we know in which order they were aggregated :)

This link _all custom analyser has some information. Relevant bits extracted below (from Shay).

You don't want to set the analyzer for _all to be keyword, _all is an aggregation of all the other fields int the doc, so you basically treat the whole aggregation of text as a single token.