5
votes

I want to query for all the documents that match any of the following values:

["Test","Cat","Dog"]

in the field categories.

I have the following mapping:

"categories": {
    "type": "string"
}

A couple of sample documents are

"categories": [
    "Test",
    "Cat"
]

Or

"categories": [
    "Blue Cat",
    "Ball"
]

I was able to pull it off with the following query:

query: {
    match: {
        categories: {
            query: ["Test","Cat","Dog"]
        }
    }

But that would return me both Documents because they both include "Cat" even tho one of them include it in the form of "Blue Cat", how can I specify they I want the exact value "Cat" not that it includes it?

I read about changing the field type on the mapping to nested, but an array is not accepted as a nested object since it doesn't have keys and values.

If I use this mapping:

"categories": {
    "type": "nested"
}

I get this error:

"object mapping for [categories] tried to parse field [null] as object, but found a concrete value"

How can I filter by the field categories using an array of possible values and making sure it matches at least one of the values exactly?

1

1 Answers

9
votes

Change the field to be "not_analyzed". Right now its using a default "standard" analyzer which will split "Blue Cat" into two tokens "Blue" and "Cat", and thats why your query matches the doc containing "Blue Cat".

Here is the mapping

{
"categories": {
    "type":     "string",
    "index":    "not_analyzed"
}}

I indexed two documents using the above mapping.

{
_index : "test_index",
_type : "test",
_id : "2",
_score : 1,
_source : {
    categories : [
        "Blue Cat",
        "Ball"
    ]
}}, {
_index : "test_index",
_type : "test",
_id : "1",
_score : 1,
_source : {
    categories : [
        "Test",
        "Cat"]
}}]}

I searched using the below template

{
"query" : {
    "constant_score" : {
        "filter" : {
            "terms" : { 
                "categories" :["Test","Cat","Dog"]
            }
        }
    }
}}

I get back only the second document

{
"took" : 9,
"timed_out" : false,
"_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
},
"hits" : {
    "total" : 1,
    "max_score" : 1,
    "hits" : [{
            "_index" : "test_index",
            "_type" : "test",
            "_id" : "1",
            "_score" : 1,
            "_source" : {
                "categories" : [
                    "Test",
                    "Cat"
                ]
            }
        }
    ]
}}