ElasticSearch - searching different doc_types with the same field name but different analyzers

Question

Let's say I make a simple ElasticSearch index:

curl -XPUT 'http://localhost:9200/test/' -d '{
    "settings": {
        "analysis": {
            "char_filter": {
                "de_acronym": {
                    "type": "mapping",
                    "mappings": [".=>"]
                }
            },
            "analyzer": {
                "analyzer1": {
                    "type":      "custom",
                    "tokenizer": "keyword",
                    "char_filter": ["de_acronym"]
                }
            }
        }
    }
}'

And I make two doc_types that have the same property name but they are analyzed slightly differently from one another:

curl -XPUT 'http://localhost:9200/test/_mapping/docA' -d '{
    "docA": {
        "properties": {
            "name": {
                "type": "string",
                "analyzer": "simple"
            }
        }
    }
}'
curl -XPUT 'http://localhost:9200/test/_mapping/docB' -d '{
    "docB": {
        "properties": {
            "name": {
                "type": "string",
                "analyzer": "analyzer1"
            }
        }
    }
}'

Next, let's say I put a document in each doc_type with the same name:

curl -XPUT 'http://localhost:9200/test/docA/1' -d '{ "name" : "U.S. Army" }'
curl -XPUT 'http://localhost:9200/test/docB/1' -d '{ "name" : "U.S. Army" }'

Let's try to search for "U.S. Army" in both doc types at the same time:

curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{
    "query": {
        "match_phrase": {
            "name": {
                "query": "U.S. Army"
            }
        }
    }
}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.5,
    "hits" : [ {
      "_index" : "test",
      "_type" : "docA",
      "_id" : "1",
      "_score" : 1.5,
      "_source":{ "name" : "U.S. Army" }
    } ]
  }
}

I only get one result! I get the other result when I specify docB's analyzer:

curl -XGET 'http://localhost:9200/test/_search?pretty' -d '
{
    "query": {
        "match_phrase": {
            "name": {
                "query": "U.S. Army",
                "analyzer": "analyzer1"
            }
        }
    }
}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test",
      "_type" : "docB",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{ "name" : "U.S. Army" }
    } ]
  }
}

I was under the impression that ES would search each doc_type with the appropriate analyzer. Is there a way to do this?

The ElasticSearch docs say that precedence for search analyzer goes:

1) The analyzer defined in the query itself, else

2) The analyzer defined in the field mapping, else ...

In this case, is ElasticSearch arbitrarily choosing which field mapping to use?

Andrei Stefan Andrei Stefan · Accepted Answer · 2014-12-17T07:14:19

Take a look at this issue in github, which seems to have started from this post in ES google groups. I believe it answers your question:

if its in a filtered query, we can't infer it, so we simply pick one of those and use its analysis settings

ElasticSearch - searching different doc_types with the same field name but different analyzers

1 Answers